On open data and its many checklists

A good point was made yesterday by our friends at the Swiss Geoportal over Twitter in the form of a question addressed to Opendata.ch and Parldigi, asking whether an EU data portal checklist for publishers, based on another checklist for consumers, by the Open Data Institute – is also applicable here in Switzerland.

Still with me? We are talking about this series of short exploratory questions, boiling down a contentious „what if …!?“ to a series of concrete measures with potential to lead to improvements:


:ballot_box_with_check: how has the data been processed?
:ballot_box_with_check: is it in raw or summary form?
:ballot_box_with_check: how will its form affect your analysis/product/application?
:ballot_box_with_check: what syntactic (language) and semantic (meaning) transformations will you need to make?
:ballot_box_with_check: is this compatible with other datasets you have?


:ballot_box_with_check: how current is the data?
:ballot_box_with_check: how regularly is it updated?
:ballot_box_with_check: do you understand all the fields and their context?
:ballot_box_with_check: for how long will it be published? what is the commitment by the publisher?
:ballot_box_with_check: what do you know about the accuracy of the data?
:ballot_box_with_check: how are missing data handled?

Personally, I’m not sure how self explanatory these questions really are, or why the arbitrary distinction between „form“ and „quality“. Let’s just shelve that for now.

Along with an explanation of some of these issues, the helpful Open Licence Assistant linked on that page aims to provide some additional clarity to the issue of many licenses. It looks like this, with some selections applied:

So, is this applicable? Here are some of the many places which explain various facets of the same thing:

This is, of course, not an exhaustive list. Yet, looking now at it, it seems to me that the role and relative authority of each is rather confusing to users and activists… not to mention the average citizen. We tend to just come back to the Open Definition, memorize it, and cross our fingers hoping not to inadvertently contradict it.

How is a checklist useful? Publication in form of data sharing requires overcoming lots of barriers, just like publication of other kinds. From a kind of organizational „writer’s block“ where there is no good reason for it not happening, it just doesn’t budge - to direct and prolonged resistance from people whose opinion shouldn’t matter. Checklists and measures help us set the facts in line.

You would think that open data tools from the European Union should be applicable to some extent in Switzerland, but who am I to argue. As long as we have fundamental agreement - in that the different explanations link to each other, use comparable terminology, or are at least described and compared centrally - then checklists are relevant in any country, time and place. Doubly so if they help people to get their data out of the cold :snowflake:

Besides that, I am a coder; I like tools. The visual, data-driven, no-nonsense and no screenfulls-of-legalese to get through approach of the EU portal works for me. And so I think that we (in data literacy projects like this one), should engage to continue to create more interpretations: ones that are even more timely and useful to the community. In any case, checklists as a form of internal and external evaluation should be a central metric to the movement.

I would also say that we need more discussion about the alignment and merits of differing schemes and approaches. And I even have an inkling that we should work towards algorithmic proof of openness, one of the reasons I pay attention to things like Frictionless Data.

There have been many initiatives to clarify the definitions and squarely put meaning behind the name. As 2018 rolls in, I firmly believe it is a topic very worthy of attention and debate. What do you think?