Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Am I a data scientist?

DZone's Guide to

Am I a data scientist?

· Big Data Zone
Free Resource

See how the beta release of Kubernetes on DC/OS 1.10 delivers the most robust platform for building & operating data-intensive, containerized apps. Register now for tech preview.

Last night I gave a very short talk (less than 5 min­utes) at the Mel­bourne Ana­lyt­ics Char­ity Christ­mas Gala, a com­bined event of the Sta­tis­ti­cal Soci­ety of Aus­tralia, Data Sci­ence Mel­bourne, Big Data Ana­lyt­ics and Mel­bourne Users of R Net­work.

This is (roughly) what I said.


Sta­tis­ti­cians seem to go through reg­u­lar peri­ods of exis­ten­tial cri­sis as they worry about other groups of peo­ple who do data analy­sis. A com­mon theme is: all these other peo­ple (usu­ally com­puter sci­en­tists) are doing our job! Don’t they know that sta­tis­ti­cians are the best peo­ple to do data analy­sis? How dare they take over our discipline!

I take a com­pletely dif­fer­ent view. I think our dis­ci­pline is in the best posi­tion it has ever been in. The demand for data analy­sis skills is greater than ever. Our grad­u­ates are highly sought after, and well paid. Being a sta­tis­ti­cian has even been described as a sexy pro­fes­sion (which pre­sum­ably is a good thing to be!).

The dif­fer­ent per­spec­tives are all about inclu­sive­ness. If we treat sta­tis­tics as a nar­row dis­ci­pline, fit­ting mod­els to data, and study­ing the prop­er­ties of those mod­els, then sta­tis­tics is in trou­ble. But if we treat what we do as a broad dis­ci­pline involv­ing data analy­sis and under­stand­ing uncer­tainty, then the future is incred­i­bly bright.

Here are two quotes from well-​​known blog­gers in the last year or two:

April 2013: Larry Wasser­man blog
Data sci­ence: the end of sta­tis­tics?
If you’re ana­lyz­ing data, you’re doing sta­tis­tics. You can call it data sci­ence or infor­mat­ics or ana­lyt­ics or what­ever, but it’s still statistics.

Novem­ber 2013: Andrew Gel­man blog
Sta­tis­tics is the least impor­tant part of data sci­ence
There’s so much that goes on with data that is about com­put­ing, not sta­tis­tics. I do think it would be fair to con­sider sta­tis­tics as a sub­set of data science …

Sta­tis­tics is important—don’t get me wrong—statistics helps us cor­rect biases … esti­mate causal effects … reg­u­lar­ize so that we’re not over­whelmed by noise … fit mod­els … visu­al­ize data … I love sta­tis­tics! But it’s not the most impor­tant part of data sci­ence, or even close.

How can two pro­fes­sors of sta­tis­tics have such dif­fer­ent views on their dis­ci­pline? The same per­spec­tives can be seen in the fol­low­ing two dia­grams (both repro­duced with permission).

Source: Drew Con­way, Sept 2010. Repro­duced under a Cre­ative Com­mons Licence.


In the first nar­row view, to be a data sci­en­tist you have to know a great deal about sta­tis­tics, math­e­mat­ics, com­puter sci­ence, pro­gram­ming, and the appli­ca­tion dis­ci­pline. If that’s true, I’ve never met a data sci­en­tist. I don’t believe they exist.

In the sec­ond broader view, every­one here is a data sci­en­tist, although we have dif­fer­ent spe­cial­iza­tions and dif­fer­ent per­spec­tives and training.

I take the broad inclu­sive view. I am a data sci­en­tist because I do data analy­sis, and I do research on the method­ol­ogy of data analy­sis. The way I would express it is that I’m a data sci­en­tist with a sta­tis­ti­cal per­spec­tive and train­ing. Other data sci­en­tists will have dif­fer­ent per­spec­tives and dif­fer­ent training.

We are com­fort­able with hav­ing med­ical spe­cial­ists, and we will go to a GP, endocri­nol­o­gist, phys­io­ther­a­pist, etc., when we have med­ical prob­lems. We also need to take a team per­spec­tive on data science.

None of us can real­is­ti­cally cover the whole field, and so we spe­cialise on cer­tain prob­lems and tech­niques. It is crazy to think that a doc­tor must know every­thing, and it is just as crazy to think a data sci­en­tist should be an expert in sta­tis­tics, math­e­mat­ics, com­put­ing, pro­gram­ming, the appli­ca­tion dis­ci­pline, etc. Instead, we need teams of data sci­en­tists with dif­fer­ent skills, with each being aware of the bound­ary of their exper­tise, and who to call in for help when required.

Let’s not be too sec­tar­ian about our dis­ci­plines, think­ing every­one not trained in the same way we were is a heretic.

It reminds me of a famous joke, writ­ten by come­dian Emo Philips:

I was walk­ing across a bridge one day, and I saw a man stand­ing on the edge, about to jump off. I imme­di­ately ran over and said “Stop! Don’t do it!“
“Why shouldn’t I?” he said.
I said, “Well, there’s so much to live for!“
“Like what?“
“Well … are you reli­gious or athe­ist?“
“Reli­gious.“
“Me too! Are you Chris­t­ian or Jew­ish?“
“Chris­t­ian.“
“Me too! Are you Catholic or Protes­tant?“
“Protes­tant.“
“Me too! What fran­cise?“
“Bap­tist.“
“Wow! Me too! North­ern Bap­tist or South­ern Bap­tist?“
“North­ern Bap­tist“
“Me too! Are you North­ern Con­ser­v­a­tive Bap­tist or North­ern Lib­eral Bap­tist?“
“North­ern Con­ser­v­a­tive Bap­tist“
“Me too! Are you North­ern Con­ser­v­a­tive Fun­da­men­tal­ist Bap­tist or North­ern Con­ser­v­a­tive Reformed Bap­tist?“
“North­ern Con­ser­v­a­tive Fun­da­men­tal­ist Bap­tist“
To which I said, “Die, heretic scum!” and pushed him off.


New Mesosphere DC/OS 1.10: Production-proven reliability, security & scalability for fast-data, modern apps. Register now for a live demo.

Topics:

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}