Fear not the data dungeon #CKANmonthly

My slides and notes added after the discussion, from a presentation at the CKAN Monthly community meeting today. Thanks a lot for the invitation, to all the people in the room, and congratulations on the 2.10 release! :tada:

https://ckan.org/blog/the-latest-ckan-release-is-here-say-hi-to-ckan-210

Side note: of the roughly 40 attendees there were at least 6 people who also participate in the CKAN dev calls, and IIRC about 10 who acknowledged oldschool vibes, still: I wonder how many people got the title of my talk, a reference to Data Expeditions, and CKAN Hackathons of a decade past?


Fear not the data dungeon!

Image created using DALL·E 2 by OpenAI, based on data aggregators like Common Crawl, and the tireless efforts of millions of content creators around the Web.

Screenshot of commoncrawl.org


Image source (a blog post reviewing a book by Prof. Mario Fischer)

We use CKAN to search for authoritative sources of data, a friendly and secure page full of metadata guiding us to resources - rather than dubious and troublesome Excel files buried deep in the murky filesystem of a random server. Finding ‘things’ online 
 it is a question of fear, uncertainty and doubt (FUD) despite - or maybe because of - the valiant efforts of search engine optimization (SEO). Any other parents here, wondering what garbage search engines spit out at their children’s queries? No, ah, ok, back to your phones you go. Anyone else here following the fun with lies (HowToGeek) that is the world’s obsession with ChatGPT (Wired)? So you know that Access to Data (a.k.a. concise statements whose veracity could be more easily checked by evidence) is a Very Good Thing - but not all doorways are equally welcoming.


Screenshot of DuckDuckGo Image search

Oops!.. Even if you use all caps and quotes, "CKAN" is mixed up with the Mexican musician C-Kan in DuckDuckGo’s search results. Google is by the way, not much better at this - but at least you see a couple of CKAN logos in the image search results. The fallacies of search, of SEO, and the roots of many of our complaints with A.I. tools like ChatGPT, should be obvious to anyone, now. Thanks to CKAN, and the tireless efforts of portal-deployers and catalog-maintainers around the world, we can truly say today: Open Data is great for SEO! And what’s great for SEO
 is usually great for A.I., too. So here’s to more FUD! I mean SEO :cold_face:


Soy un francotirador apunto, preparo rimas,
soy certero nunca fallo haré que sangre tu autoestima,
[
] tengo flow y rimas no me hace falta nada,
quieres grabar aquĂ­ pues la calidad se paga

– C-Kan ft. Big Rapper - No Fear, No Mercy (2013)

I am a sniper - I aim, I prepare rhymes,
I’m so accurate I’ll never miss I’ll make your self-esteem bleed
[
] I got flow and rhymes, I don’t need nothing,
You want to record here - the quality is paid for

Translated with DeepL

It’s kind of fun when CKAN gets confused with a rapper, especially one whose lyrics seem to reflect a fondness for “flow”, “accuracy” and “quality”. A straight-laced marketing approach for an enterprise software product would try to distance itself from this. Cultural appropriation (as opposed to cultural collaboration - BBC) would be wrong, if you get my gist. And if you weren’t convinced that CKAN rocks in my previous slide, you are now. Go fight FUD with some “flow y rimas” :guitar:


Pictured above: a data visualisation from Audio Analysis, a hackathon project involving audio analysis and automatic transcription of a pirate radio station at GLAMhack’22 (Infoclio), the Swiss annual OpenGLAM event. It’s the kind of project where open data meets machine learning to empower critical voices, and the potential for public impact is high. Please be warmly invited to GLAMhack 2023 in Geneva at the end of September :hugs:


hello.world(‘oleg’)

Who is doing the inviting: a freelancer solopreneur coder with a cat’s sense for content management glitches - as you would have if you also have been building websites since your teenage years - dedicated to furthering the art and science of commas; sharing,data,with,<3

As @loleg you might have seen me active in the Open Knowledge network, run data literacy workshops, consult renowned institutions, blog on occasion, commit with pride, and - always - try to Pull Request with deference (see also: PR etiquette - Hackernoon). What else? Canadian-Swiss expat, citizen of a climatically destabilised planet, 8-bit space nerd, family man, et cetera.


10 challenges

My input today are these ten ways to ‘hack’ CKAN for fun and/or profit. Think of this as a bunch of potential challenge topics for the next <hint>CKAN hackathon</hint>


(1) Open data is a kind of honey pot

Pictured above is my humble submission to Ludum Dare 21 (#54 will take place at the end of September) - a game where you try to guide some honeybees to the exit with a cube of honey. A bit like herding cats, the bees are wont to ignore your bait and bump stubbornly into walls, wasting time. This seems to be a passable metaphor for the way open data is used to herd data users (developers, researchers), through more points of engagement with data publishers. Games are just a great format to invite people to hack open source (GitHub) 
Bzzzt! :honeybee:

As another kind of ‘honeypot’, CKAN might also be used to train IT departments in careful publication of data and metadata, educating them in tech and legal policies, preparing them for leaks and attacks. There is a lot we could do to make community interactions with open data an opportunity for building capacity in Information Security. Which brings me to 



(2) Make CKAN more hackable

Screenshot of ckan.org/features/security

Understanding that everything is hackable is the first step of a long journey of Internet-fu. Encouraging pentesting in user communities, spreading learnings and tools to (API) users openly, training extension developers and portal maintainers
 OWASP CRS (DINAcon) is an example of how to interact with a security community, and I’d love to hear your own stories :ear: We are part of an ecosystem and suffering a common fate of many successful software projects (think Wordpress, Windows, Java, Shockwave 
) that have been the worryworms of devops. Bounties (OpenCollective) and Capture-the-Flag (ctftime) are the most widespread methods to crowdsource attention to an open source product’s footprint. They do not replace, but may well complement, a dedicated professional’s evaluation.

Let’s keep making CKAN great for developers - with a secure, open, high performance API and transparent security footprint. Check out ckanext-security and harden your instance, mate!


(3) Support & champion data (re)users

Screenshot of Showcases | opendata.swiss

The Showcase extension is probably my favorite page on the portal. Here you can really see how the data connects to applications. This is a place where I would love to see more stories and ‘raw’ hackathon projects, not just polished apps (or, as my screenshot exhibits, the very Swiss preponderance for clock-like dials and maps). We could make it easier to build user experiences through data publication, storify the legal or technical hurdles that are overcome in the effort to put data online. There is an going discussion about connecting CKAN via DCAT, RSS, ActivityPub, and other protocols to fresh channels, for a new audience.

We should hack this for Open Data Day.


(4) Induce participation in data workflows

Screenshot of Proxeus · GitHub

This is a project I’ve been tinkering with for the past year, with which I would like to make it easier to design workflows around data collection and processing using the Proxeus ‘no code’ plug-in model. There are many such business tools used to make digitalisation or data management more visual and accessible.

My money is on open data that is small, self-publish(able), actionable - not only because my resources are modest, but because that’s how data stays personal. CKAN’s awesome foundations in federation of portals make it a prime environment for data replication across organisations or whole sectors - or for resilient data refuges in activism. A cool hack in this vein would also be to combine the plug-n’play design of morph.io with a Blockly environment for scrapers.

Let me know if you are also tinkering with such things, and see potential in a CKAN integration via ckanext-workflow or otherwise.


(5) Data catalogs in the age of misinformation

Screenshot of https://memes.sucho.org/

In general, the better we can recognize participation, the more the whole community will benefit through new incentives and structures. But why would people participate in the first place? I have been thinking a lot about the interfaces between data stewardship, volunteering, and the gig economy - and I think that having the right cause, is a big driving factor.

Even though open data is often trumpted, with perfectly good reason, as a weapon against misinformation (University of Bristol), we could pay more attention to features that make it easier to validate and compare sources.

See also my ODD’22 page Cultural Refuge and Hack4SocialGood, a research project culminating in an event in Switzerland at the end of March.


(6) Not all user journeys start at the landing page

Screenshot of DuckDuckGo

Of course, many do. Having access to web portal analytics - like some outstanding portals I know - lets us as a community better understand the ‘open data marketplace’: which topics are trending, where gaps exist. At least, this was something my friend Konstantin and I suggested to a bunch of designers in 2016.

It is interesting how users interchange data platforms and websites, especially once a new generation of power-users are willing to go beyond the first or second link in a search result, explore the data in their own app. Going further, to understand data reuse patterns among the apps and other downstream users of data, goes to the next level. I hear there is a lot cooking in this kitchen, and am keen to hear more :ramen:

Just as a historical side note: before web pages got “rich”, having to spend time massaging the content - with the plethora of file formats, and archive formats (Wikipedia) on top of those - was a normal part of the BBS-era and even early Web experience. And also - remember these guys gracing the footer of every website in the 90’s? They were sometimes linked to a full-fledged web analytics viewer, no (data protection) questions asked - hard to believe, today:

Screenshot of a Hit Counter generator via hostmysite.com


(7) No data is an island

Where data stewards meet data makers. Check out the full, original & delightful Data Access Map (CC BY), illustrated by Ian Dutnall for the ODI. I added the letters to invite more discussion of the role of public network infrastructure.

Wide-Area-Networks, from public WiFi’s and cheap mobile data plans, are enablers of whole classes of the economy. Yet access is not evenly distributed, as proponents of Digital Rights (EFF) or Information Justice (openfuture) will be keen to remind us. This is why protocols like LoRaWAN (Instructables), that help to democratise the maintenance of sensor networks and edge computing, are important investments.

See also ONIA, and look forward to hearing about the intersection of open data, IoT and security in Deborah Mesquita’s csv,conf,v7 talk.


(8) Make excellent feedback loops

Image by @afsoonica from a MakeZurich project, as discussed here

Xeno-canto is an incredibly cool crowdsourced dataset of wildlife sounds, and is relied upon by thousands of projects like this one. One can easily be inspired by their flexible search syntax or the Data Mysteries page.

How could CKAN make it easier to contribute here, or start a new effort? How do we inspire and educate others in data stewardship? At what stage, and in what way, to best plug in data collection projects (Kobo, ODK, 
)? Every good library should have a section dedicated to how to be a better writer (WikiHow). That’s what I’m talkin’ gaffing about.


(9) Pinboards? Rules! Constraints? Scores!

Screenshot of dribdat for makezurich.ch

Make making data great again: this is the goal of dribdat, a project I initiated and continue to support, just one of the many awesome tools available to hackathon organizers to make playful, legal, experimentative, inclusive hacking a fun and rewarding activity.

Applied correctly, it is a powerful community building tool. But we are all very much aware of hackathon fatigue, when things go out of balance. It is important to document our ideas, attempts, successes and failures, so that even the smallest contributions count.

In the latest release, dribdat automatically tries to enrich project descriptions with metadata from CKAN’s API using ckan-embed. I would be happy to hear your experiences with documenting open data projects, create some playful (or ludic) experiences that make friends want to spend time together enjoying some delicious data cake in each other’s company. No feathers ruffled :peacock:


Image from Opendata.swiss Handbook


(10) Ask how does data really get used?

Screenshot of Hoppscotch

Iteratively, expressively. API first. This is engineering as it is practiced in the so-called industry. Data has never really been just about static files - and today, machines are still connected to each other by the loving grace of humans, with tools like the above.

__

Written by Liyas Thomas in I created Hoppscotch :alien: - Open source API development ecosystem

See also: CKAN OpenAPI viewer


Image courtesy of The Federal Chancellery.

:soon: GovTech Hackathon – Opendata.ch


Screenshot of Wolfram Alpha.

:mantelpiece_clock: Time is ticking 
 :sparkles: How do we make hackathons more fair?


So at the next hackathon: ask not what your data can do for you

Ask what you can do for your data!

Image generated using pokemon-stable-diffusion, trained by Justin Pinkney


Thanks.

:thumbsup: :thumbsdown:

Creative Commons Licence
This presentation by Oleg Lavrovsky is licensed under a
Creative Commons Attribution 4.0 International License.