Machine Learning: Basics and Takeaways
Machine Learning: Basics and Takeaways
An explanation of what machine learning is, its connection to data mining, and some high-level benefits for business owners.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Machine Learning techniques can help modern businesses run much more efficiently and in a more predictable way. In this article, I am going to scratch the surface of machine learning basics and explain what the key takeaways are for business owners.
Before we get started, let’s try to define what machine learning means. In general, machine learning is a technique, where behavior of an application or algorithm changes based on past iteration. So in other words, it’s a way of machine self-education that mimics humans by learning from the past.
Does it mean a birth of new Skynet? No, not really, this is what non-technical people believe in, and it’s only a scary fairy tale at this point. This misunderstanding is created from misinterpreting the words of Elon Mask, for example. At this point, machine learning is fully controlled by humans, it's used to complete specific tasks, and is really close to the concept of “Data Mining”, which was popular some time ago.
Basically, each machine learning task is constructed from the following components:
Dataset of samples of data or Training Sets (TS) that are used as input for initial algorithm. An example for a business would be data about the number of people living in a specific city and the number of sales in that city.
Each row of that table represent a particular “training example” that will be analyzed by machine learning.
Each column represents a fact or so called “feature” that we would like to compare with other features to find hidden dependencies between them.
An initial hypothesis that we would like to improve using ML technique. A hypothesis contains our initial model to describe how training sets depend on one another. For example, in the training set listed above, we may be looking for a way to find dependency between the number of sales and population and get a formula that we may use to predict sales in some other city without opening a point of sales there. Choosing a hypothesis and wrapping it in a way understandable for a machine learning framework is one of the critical elements, but in some cases, a hypothesis may not be clear and is generated on the fly from a training set.
Cost function, or a formula that we use to understand how our hypothesis is working, and how accurate it is.
And finally, a learning algorithm. This is where the magic happens. The learning algorithm is usually a complex mathematic based code that tweaks a hypothesis in one way or another, and after each iteration applies a cost function to measure the results of the tweaks.
The rise of machine learning is happening right now, because it requires huge computation power to perform multiple iterations over the same training set. The more examples you have in a training set, and the more facts for each example you collect, the more insights you can potentially get.
For example, if you add one more fact, “the number of stores in cities”, your hypothesis may be much more correct, because other vendors may impact your performance. Same thing with the average temperature in a city, for example.
Adding more facts into the equation requires much more computational power, which was not possible before. As the most extreme case of this factor-based problem, think about the weather prediction, which is a classical problem for ML. Almost any weather prediction station uses HPC (high performance computing) clusters to analyze and compare millions of facts to get accurate weather forecast.
With modern technology, specifically CUDA solutions from NVidia, now thousands of CPUs fit on a single board or even in a single chip, which opens limitless possibilities for data researchers and there is no need for supercomputers to analyze examples that contain up to 10000 facts. However, when Big Data comes into play and we have billons of examples each containing millions of facts, then the usage of specific dedicated farms is still a must.
What Can You Gain From Machine Learning?
If your business is using ERP or CRM system, you already have data that you can use to get insights, as you already have thousands of facts to choose from.
You do not need to immediately start collecting all other data that you may get from all possible sources, rather just start analyzing data that you already have.
Human resources and expertise are more important than tools for ML, as most of the tools are free for a small scale of data.
Hire a data scientist and provide all Datasets that you have and start gaining insights.
To gain insights in specific areas, start collecting more examples in that area, i.t add more logging to you ecommerce to understand your shoppers’ behavior.
Use hypotheses generated by machine learning to improve your procurement and forecasting.
Employ hypotheses as models to monitor your business execution and limit the number of human errors.
Jump into Big Data approach only when you understand what insights you can gain for your business data.
Opinions expressed by DZone contributors are their own.