4 basic challenges in open data

Open data strategies are today at the frontlines of most successful cooperation projects, including the worldwide fight against the Coronavirus. Here are four key challenges in regards to publishing and working with open data, adapted from the Turing Way Project which is itself inspired by the FAIR guiding principles:

Step 1: Make your data available

Put your non-personal data online. Free of charge, under an open licence, available as a whole and in accessible, non-proprietary formats. When you are looking for high quality data resources, make sure that they are provided to you in forms and under terms of use that are convenable to your goals. Try to make an effort to always attribute your sources, even for data in the public domain.

Step 2: Make your data easy to understand

Ensure that the data is fully described, with accompanying documentation in clear, plain language, and that data users have sufficient information to understand the source and analytical limitations, to make informed decisions when using it. If the data you are trying to use is insufficiently described, contact the data publisher, ask for clarification, and try to check that the data publication is updated with improved explanations over time. An open issue tracker goes a long way to help make this happen.

Step 3: Make your data easy to use

The data should be made available in a modifiable, machine-readable, well structured formats to support interoperability, traceability, and effective reuse. In many cases, this will include providing data in multiple standardized formats. I am encouraging and supporting the use of Frictionless Data Packages for the purpose of creating feedback loops, increasing quality and ease of use across any data publisher/user divide.

Step 4: Make your data citeable

Upload it to a website, a data portal, an open source repository, or all of the above. Make it easy for people to reference it with a permalink. A DOI reference is recommended in a research context. The Swiss open data community exists to help ensure this is done well, and well promoted. If the data that you are you using is not citable, it threatens the credibility and reproducibility of your analytical work, and you should not hesitate to contact data owners to clarify the issue.

For another introduction, watch my 5 minute video introduction (in German), read Was ist Open Government Data? from the Swiss e-government project, browse my notes from last semester, or check out the Open Data Showroom from the University of Bern where I presented last week. Check out the Turing Way Project that inspired this post, and contribute here: