Open Data @ BFH CAS DA

This semester we turned my regular Open Data lectures at the Berne University of Applied Sciences CAS in Data Analysis into a full day workshop. The intent of this class is to present a practitioner perspective as well as some introductory background on open data and its global movement through several real-world projects - with details of the data involved, legal conditions and technical challenges.

“Where there is perfect certainty, there is no information: There is nothing to be said.”

– Jimmy Soni & Rob Goodman on Claude Schannon

“At the core of Bayesian statistics is the idea that prior beliefs should be updated as new data is acquired.”

– Image and quote from Seeing Theory, Daniel Kunin & al., Brown University

After covering the bases of Internet data protocols and standards, we discussed the foundations of open data globally and in a Swiss context, looked at various portals and the software behind them, practiced downloading datasets them in web and desktop software, talked about the implications of the licensing constraints and file formats.

We went through some showcases, and illustrated the importance of connecting data to its context and applications. The way that social impact and business value continues to be generated through usage and feedback loops, in which interest groups and communities endeavour to make data - already open data but in principle any data - even more usable and accessible to a wider public. One important vehicle is the Hackathon , a public event where data owners and users meet to work on brainstorming and prototyping possible new uses for data.

Data literacy means being an active user of data, being aware of possible “bugs” in the facts and opinions of others - ultimately the ability to base one’s own decisions on verifiable evidence. There are several projects in Switzerland to improve educational material and create shared resources for data literacy. The OGD Handbook at handbook.opendata.swiss provides guidance for government and people who work with the public sector. At Opendata.ch we have programs like schoolofdata.ch, part of a civic society initiative involved in research with international organizations.

In the hands-on session, we collect ideas and discuss how to get to the data in a number of interesting scenarios. Like the law, open data has personal, institutional, democratic sides to it, and we learned about some of the boundaries between private and public data, the mechanisms with which it is published, and the forms in which it leads to effective collaboration.

Participants were interested to learn more about the next generation of tools at Open Knowledge, where an initiative called Frictionless Data creates standards and tools for publishing ‘Data Packages’, which are complementary to open data portals, in that they foster exchange of metadata within a wider community, encourage simple standards of universal access, and provide a mechanism for data validation, stricter attribution and better referencing of terms of use.

With the class, we brainstormed ideas for topics that would be interesting to explore in data:

  • There’s rain, and there’s rain… How can I be better informed about sudden monsoons like today?
  • The analysis of global equity trading data would certainly be exciting. Who buys, who sells, with what frequency…?
  • I would like to see realtime information of the current position of the trains/buses on a map. you could imagine an app that generates a notification when the train is 1km from my station… i could go to the station just in time.
  • A lot of data is open, but not for free. For example, I would like to receive land register extracts or tax information free of charge.
  • Health data for the whole of Switzerland

Using example materials of a previous exercise based on open statistical data on the Municipalities of Switzerland, as well as an evaluation framework from the OGD Handbook Project, we then stress tested three open datasets and explored usage scenarios:

Linked Open Data: Water well in Zurich
https://db.opü.ch/project/10

  • Is the dataset easy to find?
  • Are support options and legal guidance clear?
    • Yes, best on opendata.swiss
  • Is provenance and context of data usage specified?
  • Can the full dataset be easily downloaded?
    • Yes
  • Is the data well compiled and structured?
    • No table (CSV) available, only JSON and geodata with various attributes on opendata.swiss
    • At Wikidata the structure is clearer presented with different export formats
  • Are there usability or accessibility issues?
    • In Wikidata it’s not immediately clear how to get year of construction and other attributes
  • What do the results of automated validation tell us?
  • Any overall feedback or recommendations?
    • Water data is not just for fairies! :fairy:

Frictionless Data Package with financial data

  • Is the dataset easy to find?
    • The original data source is very easy to find.
  • Are support options and legal guidance clear?
    • no
  • Is provenance and context of data usage specified?
  • Can the full dataset be easily downloaded?
    • Yes
  • Is the data well compiled and structured?
    • yes
  • Are there usability or accessibility issues?
    • no
  • What do the results of automated validation tell us?
    • no surprises… (summary)
  • Any overall feedback or recommendations?
    • The top result in a Datahub search leads not the “core” dataset but a poorly maintained fork.

Datawrapper: Catering facilities in Zurich
https://db.opü.ch/project/34

  • Is the dataset easy to find?
    • A: Yes directly from the graphic (directly as csv )
  • Are support options and legal guidance clear?
    • A: Not in the graphic but below in the text (link Open Data and at the data source
  • Is provenance and context of data usage specified?
    • A: Yes in graphic
  • Can the full dataset be easily downloaded?
    • A: Yes as .csv
  • Is the data well compiled and structured?
    • A: Yes
  • Are there usability or accessibility issues?
    • A: Not self-explanatory graphics. Metadata missing.
  • What do the results of automated validation tell us?
    • A: None found
  • Any overall feedback or recommendations?
    • A: Code for graphic generation does not exist
    • A: File data not accessible

Feedback loops

We contacted the Zurich Open Data authorities regarding issues with the Catering facilities dataset above. Here is their (response time of 1 hour!) answer:

The record you found is actually a bit old, we should remove it. The current record is here: https://data.stadt-zuerich.ch/dataset/sid_wipo_gastwirtschaftsbetriebe. There you will find information on all catering establishments and their categorisation. The dataset contains a time series with the year-end stocks.

:+1:

Via Moodle, I have suggested an exercise to work with CKAN portals and Data Packages, and am open to questions using and publishing open data in the R analytical environment. My slides are available under a Creative Commons License here: https://bit.ly/bfhcasda2019