Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

NumHub: A Wikipedia for Data

DZone 's Guide to

NumHub: A Wikipedia for Data

Andrey Pyankov, founder of NumHub, explains his vision for sharing data.

· Big Data Zone ·
Free Resource

In this interview, I’m talking with Andrey Pyankov, founder of NumHub, which is a community-driven database of numbers, statistics, market research, industry metrics, and financial data. You can find numbers ranging from Google quarterly revenue to M&M's color distribution.

All the information on NumHub is gathered by a community of analysts and researchers, and you can request data be gathered for your research or presentations.

I spent a while looking through the data this week and found a few datasets I would have appreciated when I was working on a data science course and needed some inspiring data for project work. I liked the set of data about pet ownership in different European countries (who knew that, in Turkey, birds and fish are more common as pets than cats and dogs?). I also found a set of data about deaths from asthma in US states, which would have been interesting to use when I was learning R and using the standard dataset on particulate pollution.

But, without further ado, here is the interview...

Andrey, Can You Explain What NumHub Offers?

NumHub is like Wikipedia for numbers. Analysts, executives, and journalists can use it to look up any kind of numbers. You can also join as a contributor to add data to the database and verify existing datasets. This decentralized approach to data sourcing will allow us to gather a very broad dataset while maintaining the quality and update frequency.

Why Are You Setting Up NumHub? How Does it Differ From Google’s Dataset Search?

I worked as a financial analyst before setting up NumHub. We had access to all the paid services like Bloomberg, Euromonitor, etc. But I still had to search for many numbers online. It might be some industry-specific statistics like the number of monthly users for social networks or data for the non-developed country.

We want NumHub to be Google dataset search for a long tail of statistics that are not yet available in structured form.

What Sources Do You Use and Do You Verify Them?

At the moment we focus on publicly available sources. So that any community member can independently verify the numbers and submit an update if there are discrepancies. We’ll probably have some credibility requirements like Wikipedia, but we don’t want to police it too much. As long as the numbers have a source link for where they came from, users can decide whether they trust the source or not. We may eventually add some community voting for the trustworthiness of different sources, but that’s not our first priority.

What Can a Data Scientist Do With the Data? Is it Free to Use?

We’re still figuring out the business model. There will definitely be a paid tier for enterprise customers. For the individual users, we’ll have some free data allowance. Users will also be able to increase their data allowance by contributing data to the database and verifying existing numbers.

What Is Your Favorite Data in the Set So Far?

Probably the Star Wars characters, especially the depth you can find in the source link. We still have a lot of work adding all kinds of data like that to the database. (Jo comments - I’m really disappointed not to see C3PO there - he’s my favorite!).

Star Wars Character Data

Finally, How Can I Find Out More? Can I Get Involved?

Leave your email to get invited to our contributor community here: https://numhub.co/join or just email me at andrey@numhub.co.

In Conclusion

I’d like to thank Andrey for his time in answering my questions and explaining the product. NumHub is constantly expanding the range of topics they cover and looking for new community members, so do check them out if you want to get involved, and follow Andrey on Twitter for the latest news!

Topics:
datasets ,open data ,data science ,big data ,big data sets

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}