Synthetic is a kind of open data

oleg · 11 March 2022 16:49

Not to be confused with synthetic biology, when we talk about synthetic data we are talking about data that „are not obtained by direct measurement“. This is one of the many excellent topics that were debated by students participating in the University of Bern course in Open Government Data that it is my pleasure to support.

AIcrowd.com, with the slogan „Crowdsourcing A.I. to solve real world problems“, and many other projects in the A.I. space make a poignant case for the use of open ecosystems to train neural networks based on synthetic data sources, and to generate new ones. What is the interplay of open and synthetic data?

When we think about polished, published, public data, we think about many of the same things that are thought about in creating high quality data products. I have not seen much conversation about this in our community - if you have any thoughts, experiences or links to share, please do.

Perhaps a dataset to track and evaluate such sources could be a good starting point? See my longer blog post and let me know if you would like to discuss this at Opendata.ch/2022 or elsewhere.