Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Understanding Machine Learning

DZone's Guide to

Understanding Machine Learning

Get a nice description of what Machine Learning is, how Machine Learning works, the different types of Machine Learning, and what Machine Learning is used for.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

This article is featured in the new DZone Guide to Big Data: Data Science & Advanced Analytics. Get your free copy for more insightful articles, industry statistics, and more!

What Exactly Is Machine Learning?

Here’s the simplest definition I came across, from Berkeley: Machine learning is “[...] the branch of AI that explores ways to get computers to improve their performance based on experience.”

Let’s break that down to set some foundations on which to build our machine learning knowledge.

Branch of AI

Artificial intelligence is the study and development by which a computer and its systems are given the ability to successfully accomplish tasks that would typically require a human’s intelligent behavior. Machine learning is a part of that process. It’s the technology and process by which we train the computer to accomplish a given task.

Explores Models

Machine learning techniques are still emerging. Some models for training a computer are already recognized and used (as we will see below), but it is expected that more will be developed with time. The idea to remember here is that different models can be used when training a computer. Different business problems require different models.

Get Computers to Improve Their Performance

For a computer to accomplish a task with AI, it needs practice and adaptation. A machine learning model needs to be trained using data and in most cases, a little human help.

Based on Experience

Providing an AI with experience is another way of saying “to provide it with data.” As more data is fed into the system, the more accurately the computer can respond to it and to future data that it will encounter. More accuracy in understanding the data means a better chance to successfully accomplish its given task or to increase its degree of confidence when providing predictive insight.

Quick example:

Image title

  1. Entry data is chosen and prepared along with input conditions (e.g. credit card transactions).

  2. The machine learning algorithm is built and trained to accomplish a specific task (e.g. detect fraudulent transactions).

  3. The training data is augmented with the desired output information (e.g. these transactions appear fraudulent, these do not).

How Does Machine Learning Work?

Machine learning is often referred to as magical or a black box:

Insert data > magic black box > mission accomplished.

Let’s look at the training process itself to better understand how machine learning can create value with data.

Collect

Machine learning is dependent on data. The first step is to make sure you have the right data as dictated by the problem you are trying to solve. Consider your ability to collect it, its source, the required format, and so on.

Clean

Data can be generated by different sources, contained in different file formats, and expressed in different languages. It might be required to add or remove information from your data set, as some instances might be missing information while others might contain undesired or irrelevant entries. Its preparation will impact its usability and the reliability of the outcome.

Split

Depending on the size of your data set, only a portion of it may be required. This is usually referred to as sampling. From the chosen sample, your data should be split into two groups: one to train the algorithm and the other to evaluate it.

Train

As commonly seen with neural networks, this stage aims to find the mathematical function that will accurately accomplish the chosen goal. Using a portion of your data set, the algorithm will attempt to process the data, measure its own performance, and auto-adjust its parameters (also called backpropagation) until it can consistently produce the desired outcome with sufficient reliability.

Evaluate

Once the algorithm performs well with the training data, its performance is measured again with data that it has not yet seen. Additional adjustments are made when needed. This process allows you to prevent overfitting, which happens when the learning algorithm performs well but only with your training data.

Optimize

The model is optimized before integration within the destined application to ensure it is as lightweight and fast as possible.

Are There Different Types of Machine Learning?

There are many different models that can be used in machine learning, but they are typically grouped into three different types of learning: supervised, unsupervised, and reinforcement. Depending on the task, some models are more appropriate than others.

Supervised Learning

With this type of learning, the correct outcome for each data point is explicitly labeled when training the model. This means the learning algorithm is already given the answer when reading the data. Rather than finding the answer, it aims to find the relationship so that when unassigned data points are introduced, it can correctly classify or predict them.

Image title

In a classification context, the learning algorithm could be, for example, fed with historic credit card transactions, each labeled as safe or suspicious. It would learn the relationship between these two classifications and could then label new transactions appropriately, according to the classification parameters (e.g. purchase location, time between transactions, etc.).

Image title

In a context where data points are continuous in relation to one another, like a stock’s price through time, a regression learning algorithm can be used to predict the following data point.

Image title

Unsupervised Learning

In this case, the learning algorithm is not given the answer during training. Its objective is to find meaningful relationships between the data points. Its value lies in discovering patterns and correlations. For example, clustering is a common use of unsupervised learning in recommender systems (e.g. people who liked this bottle of wine also enjoyed this one).

Image title

Reinforcement Learning

This type of learning is a blend between supervised and unsupervised learning. It is usually used to solve more complex problems and requires interaction with an environment. Data is provided by the environment and allows the agent to respond and learn. In practice, this ranges from controlling robotic arms to find the most efficient motor combination to robot navigation where collision avoidance behavior can be learned by negative feedback from bumping into obstacles. Logic games are also well-suited to reinforcement learning, as they are traditionally defined as a sequence of decisions, such as poker, backgammon, and more recently Go with the success of AlphaGo from Google. Other applications of reinforcement learning are common in logistics, scheduling, and tactical planning of tasks.

What Can Machine Learning Be Used For?

To help you identify what situations can be tackled with machine learning, start with your data. Look for areas in your business that are capable of producing data (in large quantities) and what value can be derived from it.

Machine learning is different from other technological advancements; it is not a plug-and-play solution, at least not yet. Machine learning can be used to tackle a lot of situations and each situation requires a specific data set, model, and parameters to produce valuable results.

This means you need a clearly defined objective when starting out. Machine learning is making considerable advances in many fields, and all functions within an organization are likely to see disruptive advancements in the future. Nonetheless, some fields are riper than others to pursue its adoption.

I believe there are two functions in particular that are trailblazing businesses’ adoption of machine learning:

  1. Logistics and production.

  2. Sales and marketing.

The reason why these two areas are leading the way to a more widespread integration of machine learning within daily practices is simple: they promise a direct influence on ROI.

Most gains from its use can be categorized into two major fields: predictive insight and process automation, both of which can be used in ways that can either lower costs or increase revenue.

Predictive insight:

  • Predictive insight into customers’ behavior will provide you with more opportunities for sales.

  • Anticipating medicine effectiveness can reduce time to market.

  • Forecasting when a user is about to churn can improve retention.

In this context, machine learning has the potential to increase your reactivity by providing you with the tools and information to make decisions faster and more accurately.

Process automation and efficiency:

  • Augmenting investment management decisions with machine learning powered software can provide better margins and help mitigate costly mistakes.

  • Robotic arm movement training can increase your production line’s precision and alleviate your need for quality control.

  • Resource distribution according to user demand can save time and costs during delivery.

When machine learning is used in this context, your business becomes smarter. Your processes and systems augment your value proposition, and your resources are used more efficiently.

This article is featured in the new DZone Guide to Big Data: Data Science & Advanced Analytics. Get your free copy for more insightful articles, industry statistics, and more!

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,machine learning ,artificial intelligence

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}