Ceci n'est pas donnée

There is an entertaining 50 minute BBC & Open University documentary featuring Oxford Professor Marcus du Sautoy called The Secret Rules of Modern Living Algorithms, which goes along the lines of a discussion I had yesterday with colleagues @opendata.ch regarding the future of open data activism.

The algorithms that parse data and make it useful, either in the form of a visualisation or statistic or decision, are of course just as useful and important as the data itself. We talk a lot about how open the data is, but not enough about how open the code that collects, organises and repurposes it is.

Rarely do we get code or even pseudo-code along with data extracted from a company or government organization - unless they use open source software. Even then we assume too easily the code is too domain or platform-specific to be of much use of our third-party activities. I wonder to what extent such assumptions lead to reinventing of wheels, in scientific, commercial or activist circles.

A recent paper called Fifty shades of open discusses this topic in science (emphasis is mine):

​Scholarship has always been an open process; the idea of open science dates back to the very origin of modern science, and arguably even prior to that (Borgman, 2010). One of the foundations of the scientific method is that all work must be reproducible, and the only way for that to happen is if all processes are performed openly. The Open Science movement is therefore predicated on the idea that “full access to the major components of scientific research” is necessary (Hanwell, n.d.). This is necessary because more and more science is being done digitally, and therefore access to code has become as critical to reproducibility as access to methodology has always been. Furthermore, only by providing access to all processes — methodology (Kraker, et al., 2011), data, publications, peer review, even informal lab notebooks (Open Notebook Science Network, n.d.) — can modern science be true to the core values of the scientific enterprise.​

This kind of thinking has prompted some interesting discussion ​about the Open Definition, and it’s something I would like to take up within our community.

Even if open data projects are not groundbreaking popular apps, each of them shows insights into the usability of the datasets, shines​ a light into some area of data use and lets us consider how the people who created the data think and work.​ The ones that produce code and usable demos let us experiment directly with that insight.

Speaking of demos, the demoscene community that we have been supporting here at SODA over the past year has a tradition that goes in the other direction - dazzle people with your code magic, share algorithmic secrets - and usually ignore the data as a means-to-an-end if not just a pseudorandom or fractal by-product of the creative spark.

Exploring open datasets through hackathons is an important public service that Opendata.ch provides in Switzerland and being done globally through Open Knowledge and many others.

Question: would it not be an even better service if we made an effort to make the algorithms that go along with the data less secret and more accessible, in the same way that making an effort to explain terms of use or publishing good metadata does?

Further threads of thought:

Updated link to the video, revisiting this topic one year later. See also:

https://backchannel.com/the-myth-of-a-superhuman-ai-59282b686c62

…and join #mlftw if you’d like to chat about this topic.