Anaconda and Python Appeal to Data Scientists
Anaconda and Python Appeal to Data Scientists
Check out an interview that will help you understand the foundational principles of data science and the business problem you are trying to solve.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
It was great speaking with Ian Stokes-Rees, Computational Scientist at Continuum Analytics about their release of Anaconda version 4.4, available for both Python 2.7 and Python 3.6. Besides delivering a comprehensive platform for Python-centric data science with a single-click installer for Windows, Mac, Linux and Power8, Anaconda 4.4 is also designed to simplify working with Python 2 and Python 3 code.
Recent research finds that 89% of companies have at least one data scientist, but less than half have a data science team. These tools add a new level of appeal for enterprises who have yet to adopt data science. I’d love to offer a chat with Continuum Analytics to discuss how data science can up the enterprise game and add a layer of intelligence critical for further success.
According to Ian, Anaconda is being downloaded at a rate of four million per month while Anaconda Enterprise layers on top of the Data Science platform — a rich ecosystem data service built around Python and R.
Prior to Anaconda, users of Python needed to go to GitHub and download the bits and pieces of software to build a usable solution. Anaconda brought the pieces together so developers have a single click installer with 2,000 packages curated and managed and another 10o,000 community contributed.
What are the keys to a successful big data and data science strategy?
The strategy for a path to success with big data and data science is seeing patterns with people having appropriate access to the data. This requires a data lab environment for exploratory data analysis. Exploratory enables off-line processing to discover what you could do if you had automated workflows. This opens up data science and the door to process data with the flexibility to manage and organize the data.
How can companies get more out of their data with data science?
A data lab that enables exploratory data science where a team has the ability to partner with different parts of the business to understand data sources, problems to be solved, business strategies, and opportunities with cross-functional teams. Exploratory data labs with data scientists collaborating and connecting multiple times a day with business owners can provide timely feedback with regards to what's working and what's not with regards to business needs.
What are the most common issues you see preventing companies from realizing the benefits of big data and data science?
IT teams launching big Hadoop cluster without knowing anything about the business problem to be solved and without the people that can leverage the data that's being provided. A lot of companies will try Hadoop and walk away after a few months wasting a lot of time and money. The steps with Anaconda are much more iterative enabling people to get up and running in hours versus days, weeks, or months. Corralling the data is the first step. Don't bog things down by putting everything in one place. Think through the process, understand the value you are trying to get, and have a team that can ensure you get this value.
What are some real-world problems your clients are able to solve with Anaconda?
We give customers who rely on SAS or MatLab in financial services and business data analysis an open data science platform enabling them to initiate analysis and workflow on a local system. Developers are able to initiate on a single machine, migrate to a server, and scale onto clusters so they can get a response in milliseconds reproducing analytics rapidly without focusing on the software chain and individual components. With Anaconda, the focus on proprietary components.
What does the future hold for data science?
More qualified people will dive into data science. It's easy to capture data. You still need to establish patterns and best practices. Tableau's point and click data analytics are going in the right direction. Many people are looking to use simple programmatic interfaces.
In the next two to five years we'll see an increase in the adoption of Python and R along with increases in the adoption of standard patterns and workflows until the next Tableau emerges. This could come from Cognos or Power BI graphical analytics tools.
What do developers need to be successful working on big data and data science projects?
Data science differentiation requires an understanding of machine learning models making use of big data. A lot of data scientists don't understand basic statistics and what's going on behind the scenes, i.e. they're using TensorFlow because it's from Google versus a simple linear model and computational neural networks. Developers need to understand the foundational principles. Go beyond knowing the tools and how to apply them and begin applying data science skills. Manage in a more disciplined way. Document what you do. SAS developers don't understand SAS code. Core software engineers need to be prepared to look for and adopt open-source solutions.
Opinions expressed by DZone contributors are their own.