Cultivating Data Scientists
Cultivating Data Scientists
How can you leverage all their expertise and best tap the value of your data and use it to make money? By cultivating data scientists!
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
As an example, my company, PCH/Media, has access to an enormous amount of data. Our parent company, Publishers Clearing House, does over $1 billion in e-commerce each year, has relationships with about 75% of U.S. households, and maintains over 100 million user profiles. We have people working hard to making this data actionable, and my unit is charged with proving the business value of new ideas, by collecting or analyzing data.
Yet, "value" is subjective. It is slippery. Different points of view color the worth of the data, even within an organization. Some workers, like your web team, collect data as a byproduct of their work. Others, like your business analysts, crunch data and send out reports. Other teams are tasked with increasing performance, reducing costs, or optimizing the user experience. How can you leverage all their expertise and best tap the value of your data and use it to make money?
By cultivating data scientists!
That is not to say you don't already have good, reliable people dissecting your data every day. The challenge is how to help them. How do you to measure, discuss, and grow analytic capabilities and data scientists?
With a data science yardstick.
A Rubric for Data Analysts and Data Scientists
Here's my cut at it: the Data Analysts and Data Scientists Rubric.
This document was inspired by our Software Engineers Rubric, developed by my colleague Josh Begleiter. You will see plenty of overlap between the two. Hopefully, you will find these documents useful for planning, hiring, and developing your employees, too.
The Data Analysts and Data Scientists Rubric is intended to answer questions like:
- What’s the difference between junior data analysts, data analysts, and data scientists?
- What are data analysts and data scientists supposed to know?
- Where does my knowledge place me?
Of course, there is a lot of room for debate here: adapt the rubric to fit your needs. In fact, please let me know how you improve the rubric — I may decide to adopt some of your changes.
Data Scientists Do Not Have to Be Programmers
One decision may surprise some readers: data analysts and data scientists need not be programmers. Of course, they write queries and Excel formulas and may need to write scripts to manipulate data. However, software development work, even if just for internal tools, should be the responsibility of people who have the training and the mindset to write and maintain production code. Most data teams are big enough now to let people focus on what they are good at, with data scientists and software engineers collaborating.
That said, data analysts and data scientists should focus on extracting value from data.
It's All About That Data
The Data Analyst/Data Scientist Rubric describes employees' relationship to data. In brief:
- Junior data analysts manipulate and report on existing data.
- They answer: "What does the data say?"
- They use tools like database systems (SQL), Excel, and statistics to do their work.
- Data analysts add experience and industry specific knowledge.
- They answer: "What does the data mean?" and "What should we do about it?"
- They usually have Big Data experience, the ability to assess confidence, and the depth of knowledge to know how the organization should react
- Data scientists add vision and insight to go beyond the actual data.
- They answer: "What will the data look like in the future?" and "How does the data help us position our offerings for success?"
- They have lots of data experience, the ability to infer conclusions from the data, and the ability to assess risks and rewards
The last category could have been called "Senior Data Analysts", but the industry has adopted "Data Scientist" instead. This is fine, since it implies the ability to envision and test new theories about data.
The Rubric Will Need to Change with the Times
Of course, new technologies for Big Data, machine intelligence, and extracting data value are being invented every day. These changes will affect what a data analyst or data scientist is called on to do. For now, though, the rubric is a starting point: a place to look for important techniques, models, approaches, and mindsets. Use it as an employer or employee, analyst, or scientist to think about cultivating data science skills.
Investment in human resources around data is one bet sure to pay off.
Opinions expressed by DZone contributors are their own.