Datasets for value, with micro-economic variables concerning Swiss companies

Deutsch: Datensätze für Nutzen, mit mikro-ökonomischen Variablen über Schweizerische Unternehmen
Français: Jeux de données contenant des variables micro-économiques d’entreprises suisses

Hello to the whole community,

There are currently several research projects at the University of Bern, one of whose ultimate goals is to quantify the economic effect of Open Government Data OGD in Switzerland (after a part of theoretical development), by econometric and machine learning methods.

Does anyone know of datasets containing economic variables on organizations in Switzerland? This at the level of their performance and characteristics (like turnover, investment level, …), and if possible with a variable related to OGD (but also without). The level of aggregation would preferably be the level of individual firms, but the communal or cantonal level would also be useful.

It would be really useful to merge several datasets in oder to quantify some of the benefits of OGD, which is not yet done in the scientific literature. Recommendations could then be made to foster sustainable value for everyone.

I have already found for example concerning the Federal Statistical Office the following appropriate statistics. For businesses:

  1. Structural Business Statistics (STATENT)
  2. Business demography statistics
  3. Enterprise group statistics STAGRE
  4. (Customs data)
  5. (VAT registers)
  6. Business and establishment register BUR
  7. (Debt collection and bankruptcy statistics)

For individuals:

  1. Swiss Labour Force Survey ESPA
  2. Structural survey RS
  3. Swiss Survey on the Structure of Earnings ESS
  4. Value added statistics
  5. Statistics on Income and Living Conditions SILC
  6. (Statistics on the turnover of services)
  7. Employment statistics STATEM


Hallo an die gesamte Gemeinschaft,

An der Universität Bern gibt es mehrere Forschungsprojekte, die unter anderem zum Ziel haben, die wirtschaftlichen Auswirkungen von Open Government Data OGD in der Schweiz (nach einer gewissen theoretischen Entwicklung) mit ökonometrischen und machine learning Methoden zu quantifizieren.

Kennt jemand Datensätze mit Mikro-ökonomischen Variablen zu Organisationen in der Schweiz? Dies auf der Ebene ihrer Leistung und ihrer Merkmale (wie Umsatz, Höhe der Investitionen, …) und wenn möglich mit einer Variablen, die sich auf die BBS bezieht (aber auch ohne). Die Aggregationsebene ist vorzugsweise die Ebene der einzelnen Unternehmen, aber auch die kommunale oder kantonale Ebene wäre sinnvoll.

Es würde wirklich wertvoll sein mehrere Datensätze zusammenzuführen, um einen Teil des Nutzens von OGD zu quantifizieren, was in der wissenschaftlichen Literatur noch nicht gemacht ist. Da könnten Empfehlungen gebildet werden, um die OGD-Vorteile für alle zu fördern.

Für das BFS habe ich zum Beispiel bereits die folgenden relevanten Statistiken gefunden. Für Unternehmen:

  1. Strukturelle Unternehmensstatistik (STATENT)
  2. Statistik der Unternehmensdemografie UDEMO
  3. Unternehmensgruppenstatistik STAGRE
  4. (Angaben des Zolls)
  5. (MwSt.-Register)
  6. Unternehmens- und Betriebsregister BUR
  7. (Inkasso und Konkursstatistik)

Für Einzelpersonen:

  1. Schweizerische Arbeitskräfteerhebung SAKE
  2. Strukturelle Erhebung RS
  3. Schweizerische Erwerbsstrukturerhebung ESS
  4. Wertschöpfungsstatistiken
  5. Statistik über Einkommen und Lebensbedingungen SILC
  6. (Statistik über den Umsatz im Dienstleistungssektor)
  7. Statistiken zur Beschäftigung STATEM


Bonjour à toute la communauté,

Il s’agit de plusieurs projets de recherche à l’Université de Berne dont un des objectifs à terme est de quantifier l’effet économique des Open Government Data OGD en Suisse (après une partie de développement théorique), par des méthodes économétriques et de machine learning.

Est-ce que quelqu’un connaît des jeux de données contenant des variables économiques sur des organisations en Suisse ? Ceci au niveau de leur performance et caractéristiques (comme chiffre d’affaire, niveau d’investissement, …), et si possible avec une variable en rapport aux OGD (mais aussi sans). Le niveau d’aggrégation étant de préférence celui des entreprises individuelles, mais le niveau communal ou cantonal serait également utile.

Il serait vraiment utile d’apparier différents jeux de données afin de quantifier certains des avantages des OGD, ce qui n’est pas encore réalisé dans la littérature scientifique. Des recommandations pourraient être ainsi être formulées afin de promouvoir les avantages de l’OGD pour tout le monde.

J’ai déjà trouvé par exemple concernant l’OFS les statistiques appropriées suivantes. Pour les entreprises :

  1. Statistique structurelle des entreprises (STATENT)
  2. Statistique de la démographie des entreprises : pas de variable supplémentaire de l’UDEMO sur la STATENT
  3. Statistique des groupes d’entreprises STAGRE
  4. (Données des douanes)
  5. (Registres TVA)
  6. Registre des entreprises et des établissements BUR
  7. (Statistique des poursuites et faillites)

Pour les individus :

  1. Enquête suisse sur la population active ESPA
  2. Relevé structurel RS
  3. Enquête suisse sur la structure des salaires ESS
  4. Statistique de la valeur ajoutée
  5. Statistics on Income and Living conditions SILC
  6. (Statistique du chiffre d’affaires des services)
  7. Statistique de l’emploi STATEM
Dear @Andre , thank you for your nice question, definitely sound like an interesting work! Yet I think it’d be helpful to clarify a bit your request.

I am not sure what you by „a variable related to OGD“, could you please specify a little bit? Thank you!

Dear @Juan-Juan,

Thank you for your interest and for your question. To clarify, a „variable related to OGD“ could be for instance:

  1. A binary variable stating if the company uses Open Government Data (OGD) or not (I know it is certainly an unrecorded information).
  2. The NOGA codes for Switzerland that use the most intensively OGD. Similarly to what was done for the European Union in San Chan et al., 2015 or DemosEurope & Wise (2014) with the NACE codes (on which the NOGA system is based).
  3. At the municipal level, what percentage of the companies uses OGD.
  4. Still at the municipal level, what is the extent of OGD usage for companies.

I hope it is clearer now. I am aware these are tricky data to get, that is why I ask openly!

As recently discussed, I think the link to OGD, or let’s just say even Open Data, is quite tenuous but not unexpected. The purpose of corporate Annual Reports is to provide shareholders and the wider public with a mechanism of understanding strategic choices, the impacts of those choices, and to support the governance of those choices. What you are asking for, seems to be something like a Data Governance Annual Report, where the companies decision in reutilising public resources (OGD), or investing into third party or even completely new data sources, is made transparent.

Just to provoke some further thought - if data is the new oil, then we would be seeing these kinds of charts showing cause and affect of external events:

And, of course, the performance of data centers is being tracked in just this way and better. In other words, if you want data about performance, then you need to frame the question in a context where performance is relevant. A lot of OGD publications do not have a timeliness component at all, are updated once every few months - if at all. And I do not think the question „do you use OGD?“ is relevant as anything more than an ice-breaker, rather - at which point is OGD viable, what are the risks and benefits, and who has the keys?

I believe these were among core arguments of the now 10-year old BFH study that helped to justify and parallel initiatives. Surely there has been related research since then. Certainly the interest is apparent when you look overseas.

Things will get interesting if you talk to specific organizations, from a data protection perspective, and ideally they have both trackable processes which can generate statistics about data source utilization, and the interest in being transparent and proactive on this subject. This would, for instance, be quite a nice feature of sustainability reports.

I would imagine that we as a community should be quite open for further exchange on this topic. Thanks very much André for bringing it to the round table.

Thank you very much Oleg for your reflection and for your numerous sources!
Yes, you are right: the purpose is to assess the economic impact for companies in Switzerland either 1) micro-economically by having a measurement of open data or OGD usage for some concerned companies (binary usage or not is not sufficient), or 2) more macro-economically with aggregation at communal or cantonal level (less preferable option).
The evolution can be estimated in a time-series or rather panel data model.
The difficulty is 1) we of course don’t have the counterfactual of evolution without OGD, 2) we don’t know exactly which confounders (other variables) play a role in either adoption of OGD or economic value, 3) obvious relevant information are either lacking, not precise enough, or at a too much aggregated level.
But it is possible to take proxy data (but less good) like number of published datasets for (since 2016) or other OGD platforms, compare between communes and cantons with similar evolutions except for OGD, or use econometric methods of control (e.g., causal impact, Granger’s causality, difference-in-differences, …).
These research design and methods would allow a higher level of causality instead of mere correlation.

But by the principle of „garbage in, garbage out“, good quality and precise data is all it comes to (or almost!) in the end!
That is why I am asking for complementary sources of data to improve the solidity and validity of the econometric analyses!

It is also key to have a thorough theoretical rationale first, and that’s the reason we conduct currently theoretical development. I attempt to provide some answers to the questions precisely that you state.

Yes, right. I know this study which already demonstrated a substantial potential value of OGD in 2012. There exist also these two Swiss studies of 2013: „Wirtschaftliche Auswirkungen von Open Government Data“ and „Open Government Data – Grundlagenstudie Schweiz 2013“.

But 1) there has been no new economic study on the impact of OGD since in Switzerland, 2) very few scientific articles instead of public reports make usage of econometric methods.

As a result, it is in my opinion important, interesting, and actual to calculate nowadays this economic value after all the recent positive developments. This of course with good quality and complete data!

Thank you, @oleg , for your fascinating ideas.
And thanks to anyone who can share here some data sources of economic value.

