Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

New Focus: Data Scientist

DZone's Guide to

New Focus: Data Scientist

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Read this: http://www.forbes.com/sites/emc/2014/06/26/the-hottest-jobs-in-it-training-tomorrows-data-scientists/

Interesting subject areas: Statistics, Machine Learning, Algorithms.

I've had questions about data science from folks who (somehow) felt that calculus and differential equations were important parts of data science. I couldn't figure out how they decided that diffeq's were important. Their weird focus on calculus didn't seem to involve using any data. Odd: wanting to be a data scientist, but being unable to collect actual data.

Folks involved in data science seem to think otherwise. Calculus appears to be a side-issue at best.

I can see that statistics are clearly important for data science. Correlation and regression-based models appear to be really useful. I think, perhaps, that these are the lynch-pins of much data science. Use a sample to develop a model, confirm it over successive samples, then apply it to the population as a whole.

Algorithms become important because doing dumb statistical processing on large data sets can often prove to be intractable. Computing the median of a very large set of data can be essentially impossible if the only algorithm you know is to sort the data and find the middle-most item.

Machine learning and pattern detection may be relevant for deducing a model that offers some predictive power. Personally, I've never worked with this. I've only worked with actuaries and other quants who have a model they want to confirm (or deny or improve.)

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}