Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Science: How to Get the Most Out of Data, Science, and Technology

DZone's Guide to

Data Science: How to Get the Most Out of Data, Science, and Technology

In this article, I am going to summarize the four major components of data science: data, science, technology, and business.

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

I recently had a chance to meet Dr. Shahzad Cheema, a Lead Data Scientist at IBM's IoT Industry Lab in Munich. We had an interesting discussion around data science and its applications in the real world. According to Dr. Cheema, data science is probably the most fascinating and least understood field in IT. Luckily, we are out of the "hype" phase for big data, as we are already witnessing its adoption and acceptance in almost all industries. Like industry revolution, big data will continue to bring technological revolution in many different forms. All of the "smart" features that are showing up in products today are based on analytics and data, which is a proof that data science is a key foundation for both business and technological innovation.

So, what exactly is data science? Data science is an interdisciplinary field. It's a combination of data, science, technology, and its business impact. Business value from that process is very important and usually employs sophisticated tools and techniques to extract knowledge and actionable insights from structured or unstructured data in order to optimize business objectives.

Wikipedia defines it as "a field of scientific methods, processes, algorithms, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, like data mining." While the definition of data science is widely accepted, the implications and implementations of it in the real world remain a bit of a mystery. To get down to business implications, we need to better understand the main building blocks of data science and how they are tied together. In this article, I am going to summarize our discussion around four major components of data science: data, science, technology, and business.

Data­­­

Data is the most important component in data science. What matters isn't the size of the data (the term "big" is relative, anyway) but how it's used. This idea has been dubbed in a more sensible term: smart data. While the four famous Vs (volume, velocity, variety, and veracity) explain the underlying landscape of big data, it's the value that matters in the end. Velocity makes it very difficult to maintain and analyze data over two million records per day­­­. Feature engineering, i.e. creating meaningful/useful attributes from raw data, is a key trend in the space. Another key trend is using feature engineering to deal with unstructured data by embedding it in powerful machine learning models such as deep neural networks.

Science

Data processing algorithms (better known as machine learning) are the backbone of data science. A data scientist follows a rigorous process (such as CRISP-DM) to explore and analyze datasets while training and building the machine learning models.

A machine learning model resolves a certain problem such as predicting customer churn or identifying the most influential factors in a purchase pattern. Starting from neural networks in 1950s, providing sophisticated algorithms such as support vector machines and random forests, machine learning has not disappointed practitioners. What is most fascinating is the immediate feedback of the model through the train-validate-test process. If done properly, there is always an added value of this exploration even when the final model does not reach the desired goal.

Technology

The advancement in data processing and management tools has put life into machine learning models. While conventional spreadsheets and SQL continue to be major tools, there has been an exceptional amount of tools that have recently entered landscape — especially when the scale and rapid development is a choice.

Who would have thought a few years ago that Python and NoSQL would be competing with Java and SQL, respectively? We have seen a rapid progress and adoption of open-source tools, cloud platforms, SaaS, and APIs.

Distributed computing and technology are being democratized and have become a norm (i.e. Apache Spark and blockchain). Building large-scale, compute-intensive, real-world applications has become much less difficult thanks to smart and low-cost sensors, powerful GPUs such as Tesla P100, and compute environments such as IBM's Power AI. Have a look at the work of Matt Turck if you want to learn a bit more.

Business

Business KPIs and their impact are the most important and underrated aspect among many new entrants into the data science field. Every now and then, I meet data science enthusiasts, new graduates, and researchers with bright eyes (I used to have such a pair) who believe that being a data scientist means beating some benchmark. No! It's about meeting some objective: a business objective in 99% of the cases. Yes, there are cases and situations where you will be challenged by the underlying problem and will have to exhibit the magic, but that is not a starting point.

Most traditional businesses are in a transition phase, even in a digitization phase, so many problems can be solved through automation, data analysis, and predictive modeling. In my short career, I have witnessed success stories across a range of applications: volume forecast, churn prediction, routing optimization, real-time-bidding, fine-grain image recognition, crop-optimization, web analysis, insurance estimation, and vehicle control optimization, to name a few.

12 Best Practices for Modern Data Ingestion. Download White Paper.

Topics:
big data ,data science ,machine learning ,data analytics ,predictive analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}