Over a million developers have joined DZone.

Data science marketplaces

DZone's Guide to

Data science marketplaces

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Some new websites are being established offering “market places” for data science. Two I’ve come across recently are Experfy and SnapAnalytx.

Experfy provides a way for companies to find statisticians and other data scientists, either for short-term consultancies, or to fill full-time positions. They describe their “providers” as “Data Engineers, Data Scientists, Data Mining Experts, Data Analyst/Modelers, Big Data Solutions Architects, Visualization Designers, Statisticians, Applied Physicists, Mathematicians, Econometricians and Bioinformaticians.” Data scientists can sign up as “Providers”, companies can sign up as “Clients”.

Experfy makes money by taking a cut of any transaction made on the site.

Providers are subject to a selection process, although it is not clear what the criteria are. I heard about Experfy because they invited me to be a provider. I have far too much work already without needing more, but it looks like an interesting and useful service.

Experfy is a more specialized version of Zombal which covers science professionals generally, and not just data scientists.

SnapAnalytx takes a different approach where data scientists can post their algorithms which are then hosted on the site. Then anyone wanting to use the algorithm can upload their own data, train the model and get predictions. The “model author” can interact with the users on the site. So essentially SnapAnalytx provides the cloud hosting and computing infrastructure, and a way for data scientists and their clients to interact online.

The “unit” for sale with SnapAnalytx is a model or algorithm, whereas the “unit” for sale with Experfy is a person.

I imagine that the SnapAnalytx approach would be more suited to some problems than others. My algorithm for hierarchical forecasting would probably work well on such a platform as it takes a lot of computing power (for large hierarchies) and is suited to parallel processing. (I assume SnapAnalytx allows multiple processes.) It also works out-of-the-box for a lot of problems.

On the other hand, my algorithm for electricity demand forecasting would probably not work well on this platform as we have to tailor the model carefully to each particular region, so having a generic cloud-hosted algorithm is unlikely to give effective forecasts.

SnapAnalytx makes money from both model providers and model users. They charge $99 per month per model to providers to list models in the catalog (which seems to me like a huge cost, but perhaps there is a lot of manual work in setting up every model), and then each user is charged a fee for using the model. SnapAnalytx retains part of the user fees, and the rest goes to the model provider.

It will be interesting to see if these market places survive, and if any competition develops. Feel free to add links to competing services, or other data science market places in the comments below.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}