My slides and notes added after the discussion, from a presentation at the CKAN Monthly community meeting today. Thanks a lot for the invitation, to all the people in the room, and congratulations on the 2.10 release!
https://ckan.org/blog/the-latest-ckan-release-is-here-say-hi-to-ckan-210
Side note: of the roughly 40 attendees there were at least 6 people who also participate in the CKAN dev calls, and IIRC about 10 who acknowledged oldschool vibes, still: I wonder how many people got the title of my talk, a reference to Data Expeditions, and CKAN Hackathons of a decade past?
Fear not the data dungeon!
Image created using DALL·E 2 by OpenAI, based on data aggregators like Common Crawl, and the tireless efforts of millions of content creators around the Web.
Screenshot of commoncrawl.org
Image source (a blog post reviewing a book by Prof. Mario Fischer)
We use CKAN to search for authoritative sources of data, a friendly and secure page full of metadata guiding us to resources - rather than dubious and troublesome Excel files buried deep in the murky filesystem of a random server. Finding âthingsâ online ⊠it is a question of fear, uncertainty and doubt (FUD) despite - or maybe because of - the valiant efforts of search engine optimization (SEO). Any other parents here, wondering what garbage search engines spit out at their childrenâs queries? No, ah, ok, back to your phones you go. Anyone else here following the fun with lies (HowToGeek) that is the worldâs obsession with ChatGPT (Wired)? So you know that Access to Data (a.k.a. concise statements whose veracity could be more easily checked by evidence) is a Very Good Thing - but not all doorways are equally welcoming.
Screenshot of DuckDuckGo Image search
Oops!.. Even if you use all caps and quotes, "CKAN"
is mixed up with the Mexican musician C-Kan in DuckDuckGoâs search results. Google is by the way, not much better at this - but at least you see a couple of CKAN logos in the image search results. The fallacies of search, of SEO, and the roots of many of our complaints with A.I. tools like ChatGPT, should be obvious to anyone, now. Thanks to CKAN, and the tireless efforts of portal-deployers and catalog-maintainers around the world, we can truly say today: Open Data is great for SEO! And whatâs great for SEO⊠is usually great for A.I., too. So hereâs to more FUD! I mean SEO
Soy un francotirador apunto, preparo rimas,
soy certero nunca fallo haré que sangre tu autoestima,
[âŠ] tengo flow y rimas no me hace falta nada,
quieres grabar aquĂ pues la calidad se paga
â C-Kan ft. Big Rapper - No Fear, No Mercy (2013)
I am a sniper - I aim, I prepare rhymes,
Iâm so accurate Iâll never miss Iâll make your self-esteem bleed
[âŠ] I got flow and rhymes, I donât need nothing,
You want to record here - the quality is paid for
Translated with DeepL
Itâs kind of fun when CKAN gets confused with a rapper, especially one whose lyrics seem to reflect a fondness for âflowâ, âaccuracyâ and âqualityâ. A straight-laced marketing approach for an enterprise software product would try to distance itself from this. Cultural appropriation (as opposed to cultural collaboration - BBC) would be wrong, if you get my gist. And if you werenât convinced that CKAN rocks in my previous slide, you are now. Go fight FUD with some âflow y rimasâ
Pictured above: a data visualisation from Audio Analysis, a hackathon project involving audio analysis and automatic transcription of a pirate radio station at GLAMhackâ22 (Infoclio), the Swiss annual OpenGLAM event. Itâs the kind of project where open data meets machine learning to empower critical voices, and the potential for public impact is high. Please be warmly invited to GLAMhack 2023 in Geneva at the end of September
hello.world(âolegâ)
Who is doing the inviting: a freelancer solopreneur coder with a catâs sense for content management glitches - as you would have if you also have been building websites since your teenage years - dedicated to furthering the art and science of commas; sharing,data,with,<3
As @loleg you might have seen me active in the Open Knowledge network, run data literacy workshops, consult renowned institutions, blog on occasion, commit with pride, and - always - try to Pull Request with deference (see also: PR etiquette - Hackernoon). What else? Canadian-Swiss expat, citizen of a climatically destabilised planet, 8-bit space nerd, family man, et cetera.
10 challenges
My input today are these ten ways to âhackâ CKAN for fun and/or profit. Think of this as a bunch of potential challenge topics for the next <hint>
CKAN hackathon</hint>
(1) Open data is a kind of honey pot
Pictured above is my humble submission to Ludum Dare 21 (#54 will take place at the end of September) - a game where you try to guide some honeybees to the exit with a cube of honey. A bit like herding cats, the bees are wont to ignore your bait and bump stubbornly into walls, wasting time. This seems to be a passable metaphor for the way open data is used to herd data users (developers, researchers), through more points of engagement with data publishers. Games are just a great format to invite people to hack open source (GitHub) âŠBzzzt!
As another kind of âhoneypotâ, CKAN might also be used to train IT departments in careful publication of data and metadata, educating them in tech and legal policies, preparing them for leaks and attacks. There is a lot we could do to make community interactions with open data an opportunity for building capacity in Information Security. Which brings me to âŠ
(2) Make CKAN more hackable
Screenshot of ckan.org/features/security
Understanding that everything is hackable is the first step of a long journey of Internet-fu. Encouraging pentesting in user communities, spreading learnings and tools to (API) users openly, training extension developers and portal maintainers⊠OWASP CRS (DINAcon) is an example of how to interact with a security community, and Iâd love to hear your own stories We are part of an ecosystem and suffering a common fate of many successful software projects (think Wordpress, Windows, Java, Shockwave âŠ) that have been the worryworms of devops. Bounties (OpenCollective) and Capture-the-Flag (ctftime) are the most widespread methods to crowdsource attention to an open source productâs footprint. They do not replace, but may well complement, a dedicated professionalâs evaluation.
Letâs keep making CKAN great for developers - with a secure, open, high performance API and transparent security footprint. Check out ckanext-security and harden your instance, mate!
(3) Support & champion data (re)users
Screenshot of Showcases | opendata.swiss
The Showcase extension is probably my favorite page on the portal. Here you can really see how the data connects to applications. This is a place where I would love to see more stories and ârawâ hackathon projects, not just polished apps (or, as my screenshot exhibits, the very Swiss preponderance for clock-like dials and maps). We could make it easier to build user experiences through data publication, storify the legal or technical hurdles that are overcome in the effort to put data online. There is an going discussion about connecting CKAN via DCAT, RSS, ActivityPub, and other protocols to fresh channels, for a new audience.
We should hack this for Open Data Day.
(4) Induce participation in data workflows
Screenshot of Proxeus · GitHub
This is a project Iâve been tinkering with for the past year, with which I would like to make it easier to design workflows around data collection and processing using the Proxeus âno codeâ plug-in model. There are many such business tools used to make digitalisation or data management more visual and accessible.
My money is on open data that is small, self-publish(able), actionable - not only because my resources are modest, but because thatâs how data stays personal. CKANâs awesome foundations in federation of portals make it a prime environment for data replication across organisations or whole sectors - or for resilient data refuges in activism. A cool hack in this vein would also be to combine the plug-nâplay design of morph.io with a Blockly environment for scrapers.
Let me know if you are also tinkering with such things, and see potential in a CKAN integration via ckanext-workflow or otherwise.
(5) Data catalogs in the age of misinformation
Screenshot of https://memes.sucho.org/
In general, the better we can recognize participation, the more the whole community will benefit through new incentives and structures. But why would people participate in the first place? I have been thinking a lot about the interfaces between data stewardship, volunteering, and the gig economy - and I think that having the right cause, is a big driving factor.
Even though open data is often trumpted, with perfectly good reason, as a weapon against misinformation (University of Bristol), we could pay more attention to features that make it easier to validate and compare sources.
See also my ODDâ22 page Cultural Refuge and Hack4SocialGood, a research project culminating in an event in Switzerland at the end of March.
(6) Not all user journeys start at the landing page
Screenshot of DuckDuckGo
Of course, many do. Having access to web portal analytics - like some outstanding portals I know - lets us as a community better understand the âopen data marketplaceâ: which topics are trending, where gaps exist. At least, this was something my friend Konstantin and I suggested to a bunch of designers in 2016.
It is interesting how users interchange data platforms and websites, especially once a new generation of power-users are willing to go beyond the first or second link in a search result, explore the data in their own app. Going further, to understand data reuse patterns among the apps and other downstream users of data, goes to the next level. I hear there is a lot cooking in this kitchen, and am keen to hear more
Just as a historical side note: before web pages got ârichâ, having to spend time massaging the content - with the plethora of file formats, and archive formats (Wikipedia) on top of those - was a normal part of the BBS-era and even early Web experience. And also - remember these guys gracing the footer of every website in the 90âs? They were sometimes linked to a full-fledged web analytics viewer, no (data protection) questions asked - hard to believe, today:
Screenshot of a Hit Counter generator via hostmysite.com
(7) No data is an island
Where data stewards meet data makers. Check out the full, original & delightful Data Access Map (CC BY), illustrated by Ian Dutnall for the ODI. I added the letters to invite more discussion of the role of public network infrastructure.
Wide-Area-Networks, from public WiFiâs and cheap mobile data plans, are enablers of whole classes of the economy. Yet access is not evenly distributed, as proponents of Digital Rights (EFF) or Information Justice (openfuture) will be keen to remind us. This is why protocols like LoRaWAN (Instructables), that help to democratise the maintenance of sensor networks and edge computing, are important investments.
See also ONIA, and look forward to hearing about the intersection of open data, IoT and security in Deborah Mesquitaâs csv,conf,v7 talk.
(8) Make excellent feedback loops
Image by @afsoonica from a MakeZurich project, as discussed here
Xeno-canto is an incredibly cool crowdsourced dataset of wildlife sounds, and is relied upon by thousands of projects like this one. One can easily be inspired by their flexible search syntax or the Data Mysteries page.
How could CKAN make it easier to contribute here, or start a new effort? How do we inspire and educate others in data stewardship? At what stage, and in what way, to best plug in data collection projects (Kobo, ODK, âŠ)? Every good library should have a section dedicated to how to be a better writer (WikiHow). Thatâs what Iâm talkinâ gaffing about.
(9) Pinboards? Rules! Constraints? Scores!
Screenshot of dribdat for makezurich.ch
Make making data great again: this is the goal of dribdat, a project I initiated and continue to support, just one of the many awesome tools available to hackathon organizers to make playful, legal, experimentative, inclusive hacking a fun and rewarding activity.
Applied correctly, it is a powerful community building tool. But we are all very much aware of hackathon fatigue, when things go out of balance. It is important to document our ideas, attempts, successes and failures, so that even the smallest contributions count.
In the latest release, dribdat automatically tries to enrich project descriptions with metadata from CKANâs API using ckan-embed. I would be happy to hear your experiences with documenting open data projects, create some playful (or ludic) experiences that make friends want to spend time together enjoying some delicious data cake in each otherâs company. No feathers ruffled
Image from Opendata.swiss Handbook
(10) Ask how does data really get used?
Screenshot of Hoppscotch
Iteratively, expressively. API first. This is engineering as it is practiced in the so-called industry. Data has never really been just about static files - and today, machines are still connected to each other by the loving grace of humans, with tools like the above.
__
Written by Liyas Thomas in I created Hoppscotch - Open source API development ecosystem
See also: CKAN OpenAPI viewer
Image courtesy of The Federal Chancellery.
GovTech Hackathon â Opendata.ch
Screenshot of Wolfram Alpha.
Time is ticking ⊠How do we make hackathons more fair?
So at the next hackathon: ask not what your data can do for you
Ask what you can do for your data!
Image generated using pokemon-stable-diffusion, trained by Justin Pinkney
Thanks.
This presentation by Oleg Lavrovsky is licensed under a
Creative Commons Attribution 4.0 International License.