The Changing Landscape: Data Science Trends
The Changing Landscape: Data Science Trends
An overview of some less-popular data science trends, such as natural language generation and automation.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Year after year, data science techniques mature and deliver outstanding results with successful implementations. 2016 ends with organizations embracing Big Data Analytics, Artificial Intelligence, and Data Science as key differentiators in their business processes. There have been various developments in the field of data science and related technologies. We saw growth in Data Scientists from all fields of profession and study. Most consulting enterprises established Analytics and Data Science as one of their key offerings, with many niche startups mushrooming to grab a space in this area.
The advantage that we have is the increased contribution of the Data Scientists to open source development communities, laying out new thought processes in the analytics industry and bringing out innovative ways to solve business problems. These factors with other emerging technologies like cloud, big data, and mobile technologies have led to the evolution of new trends in data science. Let us look at a high level evolution of data science and its related technologies in the past, and discuss new emerging trends in the field of data science:
Data Science: First appeared as a term in early 1997 in a lecture by C.F. Jeff Wu, called “Statistics = Data Science?”. In 2008, DJ Patil and Jeff Hammerbacher for the first time used the words “Data Scientists” to describe their teams. The year 2010 marked the rise of Data Science and Data Scientists when enterprises started to practice it in their trades.
Big Data Analytics: The rise of data science is coupled with the growth of big data. Big data is a termed coined for an amount of data with high volume, velocity, and veracity. Though in the past, we had sufficient space available to store big data, but there was limited access to analyze and process this data. For the first time in 2004, Google published research papers on Google File System (GFS) to store and process big data. Later, in 2006, Yahoo developed its first prototype on Hadoop, which was open sourced and became the heart of big data analytics. For a data scientist, it is of the utmost importance to learn techniques to tap into the intelligence from big data. This requires them to understand and execute the machine learning algorithms and data science techniques in these big data analytics platforms and tools.
Cloud and Data Science: Cloud reduced the overall cost of infrastructure, software, and platforms. This makes it ideal for analyzing Big Data and storing large sets of data online at a reduced cost and minimum maintenance overhead. The cloud-based data science and machine learning platforms provide an ideal environment for data scientists to access data stored in the cloud, process it, and analyze them in the cloud.
Data Science for Internet of Things (IoT): Data Science sits at the core of Internet of Things to makes the things smart as well as capture insights from the connected things (Sensors, actuator, and machines). The methodology for Data Science implementations in an IoT environment is different from traditional techniques. As per Gartner, there would be 21 billion connected IoT devices which makes it essential for a Data Scientist to learn the applications of Data Science in IoT environments.
Natural Language Processing (NLP) is the ability to comprehend human language. This may be written text, speech, or video. The natural language doesn’t have any fixed structure. This makes it difficult to store and process data. NLP is a current hot topic, and we can find several solutions revolving in this space. However, there is still a lot more to achieve and it is a very active area of research.
Natural Language Generation (NLG) turns raw data into a language that any audience with no knowledge of the raw data can understand. The simplest level of NLG is to turn a few data points to sentences. This is a niche area, and Data Scientists should learn the approach and techniques to embed NLG in analytics systems.
Deep Learning is a fast-growing field in Data Science. The ability of Deep Learning methods to learn complex nonlinear relations makes it stand out from the traditional Machine Learning techniques. Neural networks are the precursors to deep learning. What makes deep learning different is the use of a high number of hidden layers which have been possible due to the growth of computation power. Tensorflow and H2O.ai’s deep learning packages are open source and provide a good platform to start implementing deep learning algorithms.
Reinforcement Learning: In this learning method, a system automatically tries to understand the situation, learn from the interactions, and choose the optimal path for itself to attend its objective. This is based on a reward system where the learner is not told what action to take but rewarded when it takes the correct decision. This method is same as how a student learns — rewarded when he excels in the exam and punished when he fails. This is a niche area for data scientists to explore and contribute. You can learn more about reinforcement learning from the MIT Press home page for this book.
Transfer Learning is a current hot topic to be explored by data scientists. In this method, a new task can be learned by transferring the knowledge from a related task that has already been learned. This has immense application for reusing models across domains, in areas where the data is sparse. As an example, using transfer learning, one can develop a sentiment analysis model on some products where abundant reviews are available and use this knowledge to develop the same type of model for some other products with sparse reviews. You can learn more about this technique from the Transfer Learning Handbook from University of Wisconsin by Lisa Torrey and Jude Shavlik.
Data Science Automation: Innovation for data science automation has begun and will evolve gradually in years to come. We are currently at a stage where we have begun to tackle automation for individual data science modules. From here, we need to move to a more generic data science platform, with all modules automated and integrated together. I had provided an overview on this in Data Science Automation for Big Data and IoT Environment.
There are never ending discussions on Data Scientists Automated and Unemployed by 2025! There is a threat for new Data Scientists and Emerging Data Technologies Professional to be brought down by some other Data X in future. It is important to remain updated and sharpen skill sets with time. Technologies are changing at a breakneck pace than the fast paced living in New York and London. The best plan is to remain a Data lover and be data magicians. Let the job role and title takes its own route along with the trend.
Opinions expressed by DZone contributors are their own.