Open Data workshop at #SIXHackathon

oleg · 8 February 2016 12:11

On this page we are documenting the Open Data track of the hackathon organized by the FinTech incubation branch of SIX Group at Schiffbau in Zurich and RainmakingLoft in London: See the official website for details. @sodacamper is running a workshop (slides available here) on behalf of Opendata.ch, the Swiss chapter of Open Knowledge during the event. We are bringing in data engineering and crowdsourcing know-how (see finance hackathons 2012, 2013, 2015), experience with coaching and connections to global networks. Leave a comment on the forum or mention @sodacamper and #SIXHackathon if you need an answer urgently. See the section further down for a quickstart.

Hackathon datasets

Datasets supported at the event documented here. Additional links to data sources, tips and tools will be posted in the comments thread below.

Financial data samples for SIX Hackathon

Extracted, aggregated, anonymized payment transactions from demo accounts during 01.2013-01.2016. [Details …]
Tick data for January 2015 of 20 Swiss BlueChip securities [Details …]

Format: CSV | Download: Bitbucket (0.5GB) | License: unlicensed*
(* not open data, supported only for hackathon participants)

Contact: Maneesh and Thomas Landis in Zürich, Sergey in London

Quickstart guide

Some tips to get you started with finding open data for your project:

1) Define areas of interest and expand your [terms]

For example, if you are planning to work on bitcoin, think of related topics like “alternative currencies”, “monetary exchange”, “trading platforms”, “e-commerce”. But also unrelated topics from “weather patterns” to “music tastes” can be thrown into this pot. Decide which ones are most likely going to be a priority based on your interests and ambitions as hackathon participant. Even if you don’t do any of the other steps, you will have a basic mindmap that will make it easier to choose a team and dive into data at the event. And if you need some extra inspiration, check out the new Open Banking Standard in the U.K.

2) Figure out where to start your [data] searches

Of course, search engines are also going to be useful for quickly navigating the world wild web of information. When it comes to data, sometimes you are more flexible on the topic and less flexible (due to timing and other constraints) about the accessibility or format of the data sources you will ultimately be able to use. Open data portals - and Open Government Data portals in Switzerland, U.K., Europe, USA… - can be a source of discovery. Check out OpenSpending, an independent catalogue of over 1’000 datasets on the way governments plan and spend money around the world. Visit OpenCorporates for a database of ~99M companies. See how to plug your app into Open Product Data. Get tips from the Swiss open data community on finance data and other topics.

3) Sketch your idea, learn the tools, [participate] in the community

You do not have to be a programmer to excel at a hackathon, and contributions in vision, design, research, planning, documentation and communications can make all the difference. Often it all starts with a simple hand drawn sketch. But, in the long run, making an impact and impressing users does mean knowing at least where the technology is going (for example, see where OpenSpending is headed). The open data and open source movements are concerned with levelling the playing field: Web Standards and Open Source Software were made possible through contributions of massive virtual communities of developers and users. It takes at least as much drive and dedication to succeed with a business powered by open data - the network effects and new socioeconomic models is what make it worthwhile.

So get out there, build, explore, set up - comment below, tweet, blog, express your curiosity and you will learn a lot! Feel free to reach out for tips and advice along the way: that is what soda camp is for.

Glengarry Glen Ross © New Line Cinema 1992

oleg · 5 March 2016 13:17

The Etherpad service we were using is experiencing technical issues due to a huge amount of traffic in conjunction with International Open Data Day. Here is the latest Q & A responses, please login and use this forum to post more questions:

How to publish and where to find frictionless data and metadata

http://data.okfn.org/

Where to get free software for exploring/wrangling/cleaning tabular data

https://github.com/OpenRefine/OpenRefine/releases

How to scrape data from web pages

Is it possible to use blockchain techniques for opening data?

Where to find data on ATM locations and attributes about them from around the world?

What does open stock market / index data look like?

http://data.okfn.org/data/core/s-and-p-500

Where to find population statistics and other census datasets?

Does open data exist on connected devices?

Is there open data on cellular network depending on location?

How to send notification messages with peer to peer Android apps?

Where to find open news content relevant to finance management?

How to communicate distributed computing architectures?

Open data sets on geographical occurrence of road accidents and other injuries

oleg · 5 March 2016 18:30

On Saturday evening I went around in Zürich (on request of the organisers) to do a quick survey of the teams. I asked how many were using the SIX sample dataset (19% yes or considering), how many were using open data (46% yes or considering), how confident people felt about their project on a scale of 1 low confidence - 5 very confident (average: 4.12, median: 4), and what their development stack was (NB: several teams chose not to answer), from which I made this word cloud of the most popular technologies mid-way in the hackathon:

It would be great to hear how things are coming along in London as well, and I look forward to more questions from the participants as we get closer to the deadline!

oleg · 6 March 2016 12:22

I do not often get the privilege and responsibility of judging hackathon projects, and in the events I organise there is rarely an evaluation component. Before we begin, I would like to make a brief note of how I will evaluate presentations in the ‘Open Data’ category. I will rank them according to three aspects.

Demo-effect: how much effort and ingenuity was put into integrating data sources and deriving insights from data.
Open business potential: appraisal of the data your application produces, or potential in an open ecosystem.
Practical benefits: how well you convince me of the impact your project will have on people’s lives.

Good luck everyone!

oleg · 6 March 2016 17:50

Congratulations to all the teams for making it to the finish line, overcoming the 36 hour pressure cooker and coming out with a lot of fabulous ideas and compelling demos. I was also put under extraordinary pressure to make a decision for the Open Data workshop award - and I attribute the fact that I was using data to track and quickly calculate essential variables to keeping my senses in the final minutes. Congratulations again to team InvestoBot!

Many of the 14 teams I interviewed (out of 40 overall) were, in fact, exemplary in their use of open data and thoughts about publishing data and creating platforms; it goes without saying that, if I could, I would have given all of you a prize. But I trust that what you have learned, connections made and the experience gained will, in the long run, be the best reward.

In follow-up to the hackathon I had another interview with the InstaBot team. Here is what they had to say about the experience:

InstaBot’s team first got together when theme of hackathon was announced. Each of the members of the team is quite interested in the stock market - something that thousands of people participate in, without the possibility of completely understanding the rules or the possibility of predicting outcomes. They say the finance theme is special, few hands-on events like this exist - usually hackathons are open themed, and it was their first hackathon working as a team.

Why do you care about Fintech? Power shifting to the community rather than centered on large corps. Smaller players controlling banking landscape in the future. They work in the belief that “decentralised banking” can lead to more innovation. One of them already made a video about Artificial Intelligence in financial industry during a course. They are Computer Science students interested in machine learning, something that needs a lot of data.

They learned a new library for their project, had an excellent experience, and the teamwork worked great. But unfortunately there is no time for them to make a startup now, as they are first year students, only in London since September.

The project’s backend code is in Python and open source. They get some runtime data from IBM, otherwise the code is self running. They took a look at some neural network papers, accuracy 30%. Through Google Scholar, found some stock market prediction. Started using sentiment analysis with Bluemix/Watson, and Google Finance for stock prices are directly news connected (top 10 articles about the company stock).

The founding premise of their project was developed based on the idea that the data is open. Small organizations are relying on this kind of data, and they imagine that making it private would be a disaster that wouldn’t just affect their hack. They also looked at Yahoo Finance and mentioned that a lot of other APIs can a similar job. Many NASDAQ companies even publish their own data using their own APIs.

They were inspired by the journalist from WIRED who spoke at the start of the hackathon. He talked about how this could change a lot in user interaction - a powerful idea…

They didn’t have time to look at standards, were too focused during the hackathon. Some of the other challenges: they started with Facebook Messenger (private chat), the API was not working, after 4-5 hours gave up. Twitter’s API worked much better.

They had accuracy problems: really high accuracy was giving from the sentiment API but not possible, they didn’t take the sentiment initially into regard. Confidence was too high because the network started overfitting the data, and time based prediction was “too perfect”. This was important because in all the papers they looked at the rate of success is based on the back-predicting. They said that getting started with Bluemix/Azure is quick but the UI is messy. It takes a long time to find what you need & set up the service. Took us much longer to find on the website where to find the username and password (even the helper from IBM had trouble finding it).

But… on Saturday afternoon everything was ready to test, they finished work before midnight and started preparing for the pitch. After debugging so many issues it was time for the 2 minute pitches.

What made your project a success? “We tried not to aim it at similar people to ourselves (i.e. same background). Our user is anybody who doesn’t have a lot of technical knowledge, just a normal person who is on the go and wants to invest.” They want quick short info on their phone, and this idea resonated with some people. If people can see that our prediction is accurate, we would influence their decision.

Next steps: a subscription model, we would give you extra data, private message capabilities, empowering the investor rather than invest on their behalf. The team is based in London and surely can develop it further, have access to the resources, learn more about the market, how our startup could function, what would be the first steps in developing a company around the bot. The IBM mentor was constantly asking technical questions, there was initial encouragement but besides tips on how to improve it, we didn’t get much business advice.

“We’ll see, after the exams…”

Good luck, guys!