Machine Learning in a Box (Part 3): Algorithm Learning Styles
Machine Learning in a Box (Part 3): Algorithm Learning Styles
Let’s take a look at the main learning styles for machine learning algorithms and the associated sub-categories such as regression and clustering.
Join the DZone community and get the full member experience.Join For Free
The most visionary programmers today dream of what a robot could do, just like their counterparts in 1976 dreamed of what personal computers could do. Read more on MistyRobotics.com and enter to win your own Misty.
Welcome to Week 3 of Machine Learning in a Box!
In case you are catching the train running, here is the link to the introduction article of the Machine Learning in a Box series, which allows you to read the series from the start. At the end of this introduction article, you will find the links for each article of the series.
Quick Recap From Last Week
Last week, we saw how a project methodology could help you become successful with your Machine Learning projects.
In my previous article, "Machine Learning in a Box (Week 2): Project Methodologies," I wrote about Algorithms Learning Styles. You will find some personal thought about the CRISP-DM methodology.
Algorithm Learning Styles
When I started my journey at KXEN, I didn’t have a pedigree in data mining or data science. I have a tech support and programmer background. So, I have an understanding of what the word algorithm means in terms of programming, and I discovered that for data science, there is no difference.
When we were at school, we all solved algebra problems like “Find the equation of the line that passes through the points (-1, -1) and (1, 2)” or “Find the minima and maxima of the function f(x)=x4−8×2+5 and f(x)=x4−8×2+5.” We did that manually by applying an algorithm we learned during our classroom study.
There are plenty of algorithms to help you solve a single type of problem, which in our Machine Learning project is usually represented by our data mining goal.
So, we need a way to organize our toolbox of algorithms. There are many ways to organize and group algorithms together, and here I will use something called the “Learning style.”
Using the “learning style” helps you think about how you will be preparing and using your data to build your model. Ultimately, you will try to pick the most appropriate algorithms to test and compare results.
Let’s take a look at the main learning styles for machine learning algorithms and the associated sub-categories.
With supervised learning, you will infer a function using a set of labeled data where the outcome (the target) is known. This dataset is also known as the training dataset.
The training dataset can be represented as a pair consisting of an input vector of features (or variables, dimensions) and the associated output value.
Therefore, the goal of a supervised learning algorithm is to analyze the training data and produce a function that can score a new input vector of features and get the predicted output value.
This will require the algorithm to generalize patterns (in the inferred function) from the training data in order to correctly determine the output value for any new and unseen input vector of features in a “reasonable” way.
There is a wide range of supervised learning algorithms, and they all come with their strengths and weaknesses. This implies that there isn’t a “magic” algorithm that can address all supervised learning problems.
You can further group supervised learning algorithms like this:
This is applicable when your target is represented as a category or a class, like “true” and “false” or “A” and “B” for a binary classification, or “A,” “B,” and “C” for a multi-class classification.
The following diagram depicts a simple classification example where each icon is positioned based on its input value (x1 & x2 axis) and colored based on the output value. The inferred function is the green line (linear function here), and each question mark is new input that the inferred function will assign to one side or the other.
This is applicable when your target is represented as a continuous number, like a financial revenue, a weight, or a temperature.
The following diagram depicts a simple regression example where each mark is positioned based on its input value (x-axis) and the output value (y-axis). The inferred function is the green line, which can get you the “y” output value for any “x” input value.
Time Series Forecasting
This is applicable when your training data set represents a signal or a series of value where you need to infer the next N values using the previous data.
Some people may argue that time series forecasting is a kind of regression, except that the inferred function for time series will produce a series of values instead of a unique value like in a regression.
In addition, the data set structure for time series requires an “order” column with unique values (usually a date, but could be an increment column in some cases).
The following example shows a series of points at fix interval: the blue dots. The time series algorithm inferred function (the green line) represents a cosine function that can be used to predict the 5 next values (the red dots).
To summarize, the big difference between a classification and a regression is the representation of the target variable (the output), where one is discrete (categories) and the other is continuous.
As opposed to supervised learning, with unsupervised learning, you will infer a function using a set of unlabeled data (no defined outcome).
Therefore, the inferring function is meant to describe hidden underlying structure and patterns or distribution in the data. Unlike supervised learning, there is no real way to evaluate the accuracy or relevance of the found structures and patterns.
You can further group unsupervised learning algorithms like this:
This type of algorithm is applicable when you need to define groups of entities (a.k.a. clusters) based on the “similarity” or “distance” of the entity attributes compared to the overall distribution. Each clustering algorithms have their own grouping strategy either based on distance to a center, the group density, the group distribution etc. just like some will allow or prevent overlap, or the presence of residual items.
In the following example, the algorithm has defined 5 clusters using the distance to the center.
You can apply this type of algorithm when using transactional dataset linking items together or users to items, and your goal is to extract rules about the relation. A common rule example can be that you buy X when you buy Y. Off course, rules can be longer where multiple items are involved or enforce a certain sequence.
In the below example, a set of rules is extracted from a series of user shopping transaction.
Other Learning Styles
With semi-supervised learning, only a portion of the input data is labeled, which means that the algorithm must learn the structures to organize the data as well as make predictions.
It can become really expensive and time-consuming to label all your data, or worse, they could be wrongly labeled.
If you take an image library as an example, only a small portion of the images will be labeled.
Therefore, both unsupervised and supervised techniques are leveraged to make the best use of unlabeled data by clustering them with labeled data or make best guess predictions and use all that to build the model.
With reinforcement learning, the algorithm tries to find the “best ways” (a sequence of decisions or actions) to earn the greatest “reward.”
Typically, at every step, a decision is taken in an environment that leads to a reward and a state. By performing this many times, the algorithm is able to learn how to improve its decisions and its ability to earn greater rewards.
To write this article, I leveraged several sources for inspiration, details, and ideas.
- Machine Learning 101 by Towards Data Science
- Machine Learning Explained by Ronald van Loon
I hope that this blog helps clear some lingo around Machine Learning, and helps you understand that these algorithms are here to help you to produce the best “functions” using your training data (labeled or not) that you can then apply to new sets of data.
Next week, we will start looking at what to install to get started. So, get your internet connection to download SAP HANA, express edition, and some additional components and tools.
Would you find it useful to use Slack to discuss this blog series and engage?
Any other proposal is welcome! Let me know in the comments.
Published at DZone with permission of Abdel Dadouche , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.