More COVID open data

Hi all

Through Freedom of Information laws, I got (Jan 22nd) from BAG/FOPH some COVID data.

The first file contains information, by canton and age group, on positive and negative tests. I suspect that nothing is new there, everything was already public.

The second file contains so-called “line data”, information about COVID cases. Each case has canton, fall_dt and manifestation_dt information. The first date is always there and corresponds roughly to when the case became known (see below). The second date is only sometimes there and corresponds to the symptoms date.

This might appear to simply be data that is otherwise already available but it does contain new and interesting elements. I believe two new types of analysis are possible:

  • per canton, compare the evolution over time of the fraction of cases for which this symptoms date is given
  • per canton, for the cases where the symptoms date is given, compare the evolution of the difference in time with the case date provided.

Both should be reflective of the quality of the contact tracing interview process for each canton, assuming there is a more or less constant fraction of people who test after symptoms across cantons (this assumption might not always hold, depending on different reasons - testing strategies might be wildly different). It would be interesting to compare this data with the usual case counts. Overall, I expect a lot of discrepancies between the cantons.

There is no quantitative metric of the quality of contact tracing right now, but this should become an important issue as a new amendment ties some federal financing to the issue.

I list below some additional information I clarified on what the data was meant to be:

fall_dt: case date This date corresponds to the first date between these dates (date of registration (reception by BAG), date of test, date of sample). Because of this, the case date for a specific case can change as we receive more information. For COVID-19, this date usually represents the test date, which is why the number of cases for the last 2-3 days is provisional.

manifestation_dt: date of manifestation, taken from the clinical declaration, is the date on which the first symptoms appeared.

NA = not available (not reported most probably due to not known) – as the testing strategy involves only testing symptomatic people, “no symptom” should not be in the data although we cannot exclude it of course (if someone asymtomatic convinced a test centre to be tested and then got positive). There is one exception: the mass tests by GR, BL and the Army, which are a minority of <1 % distributed over weeks.

The data we shared is a line listing – every line is an individual entry (no “bleed” to other rows). (POD: this responds to a question I had, if a date appears somewhere, it’s only applicable to that row)

Take good care of that data, and please post here if you do something with it!

1 Like