DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Big Data
  4. NumHub: A Wikipedia for Data

NumHub: A Wikipedia for Data

Andrey Pyankov, founder of NumHub, explains his vision for sharing data.

Jo Stichbury user avatar by
Jo Stichbury
CORE ·
Jan. 22, 19 · Interview
Like (3)
Save
Tweet
Share
5.67K Views

Join the DZone community and get the full member experience.

Join For Free

In this interview, I’m talking with Andrey Pyankov, founder of NumHub, which is a community-driven database of numbers, statistics, market research, industry metrics, and financial data. You can find numbers ranging from Google quarterly revenue to M&M's color distribution.

All the information on NumHub is gathered by a community of analysts and researchers, and you can request data be gathered for your research or presentations.

I spent a while looking through the data this week and found a few datasets I would have appreciated when I was working on a data science course and needed some inspiring data for project work. I liked the set of data about pet ownership in different European countries (who knew that, in Turkey, birds and fish are more common as pets than cats and dogs?). I also found a set of data about deaths from asthma in US states, which would have been interesting to use when I was learning R and using the standard dataset on particulate pollution.

But, without further ado, here is the interview...

Andrey, Can You Explain What NumHub Offers?

NumHub is like Wikipedia for numbers. Analysts, executives, and journalists can use it to look up any kind of numbers. You can also join as a contributor to add data to the database and verify existing datasets. This decentralized approach to data sourcing will allow us to gather a very broad dataset while maintaining the quality and update frequency.

Why Are You Setting Up NumHub? How Does it Differ From Google’s Dataset Search?

I worked as a financial analyst before setting up NumHub. We had access to all the paid services like Bloomberg, Euromonitor, etc. But I still had to search for many numbers online. It might be some industry-specific statistics like the number of monthly users for social networks or data for the non-developed country.

We want NumHub to be Google dataset search for a long tail of statistics that are not yet available in structured form.

What Sources Do You Use and Do You Verify Them?

At the moment we focus on publicly available sources. So that any community member can independently verify the numbers and submit an update if there are discrepancies. We’ll probably have some credibility requirements like Wikipedia, but we don’t want to police it too much. As long as the numbers have a source link for where they came from, users can decide whether they trust the source or not. We may eventually add some community voting for the trustworthiness of different sources, but that’s not our first priority.

What Can a Data Scientist Do With the Data? Is it Free to Use?

We’re still figuring out the business model. There will definitely be a paid tier for enterprise customers. For the individual users, we’ll have some free data allowance. Users will also be able to increase their data allowance by contributing data to the database and verifying existing numbers.

What Is Your Favorite Data in the Set So Far?

Probably the Star Wars characters, especially the depth you can find in the source link. We still have a lot of work adding all kinds of data like that to the database. (Jo comments - I’m really disappointed not to see C3PO there - he’s my favorite!).

Star Wars Character Data

Finally, How Can I Find Out More? Can I Get Involved?

Leave your email to get invited to our contributor community here: https://numhub.co/join or just email me at andrey@numhub.co.

In Conclusion

I’d like to thank Andrey for his time in answering my questions and explaining the product. NumHub is constantly expanding the range of topics they cover and looking for new community members, so do check them out if you want to get involved, and follow Andrey on Twitter for the latest news!

Data science

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Is Policy-as-Code? An Introduction to Open Policy Agent
  • Integration: Data, Security, Challenges, and Best Solutions
  • How to Cut the Release Inspection Time From 4 Days to 4 Hours
  • A Guide To Successful DevOps in Web3

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: