Fundamentals of Machine Learning
Fundamentals of Machine Learning
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Let's face it - computing was created to analyze data and machine learning represents the state-of-the-art in making sense of data. For many years it has been out of reach for the common developer.
This is perhaps one of the highest paid and most sought-after skills today. No question about it - this is the place to really make a big as a developer.
Figure 1: The world of machine learning
Machine learning represents the logical extension of simple data retrieval and storage. It is about developing building blocks that make computers learn and behave more intelligently.
Machine learning makes it possible to mine historical data and make predictions about future trends. Without realizing it, you are probably already using the benefits of machine learning. Search engine results, online recommendations, ad targeting, fraud detection, and spam filtering are all examples of what is possible with machine learning.
Machine learning is about making data-driven decisions. While instinct might be important, it is difficult to beat empirical data.
The many facets of machine learning
Once you start to dive deep into the topic you start addressing such topics as:
Supervised and unsupervised learning
Markov models and Bayesian networks and much more
Mahout and Hadoop
The Apache Mahout project's goal is to build a scalable machine learning library.
There is some degree of overlap with big data analytics within a Hadoop
There is an entire machine learning open-source project that you can get for free with Hadoop. You can learn more here:
Mahout includes algorithms for clustering, classfication and collaborative filtering. You can also find:
Matrix factorization based recommenders
K-Means, Fuzzy K-Means clustering
Latent Dirichlet Allocation
Singular Value Decomposition
Logistic regression classifier
(Complementary) Naive Bayes classifier
Random forest classifier
I went to UC Berkeley and they offer many awesome courses there
I wish I had more time. I would seriously consider taking this free MIT online class, which you can find here:
Azure is democratizing machine learning
Historically, machine learning, has required complex software and high-end computers. This field of computing required season data scientist. What's been needed is a fully managed cloud service for this form of machine learning, also known as predictiveanalytics.
Welcome To ML Studio
Using simple drag-and-drop gestures along with some data flow graphs you are able to set up some experiments and take advantage of sophisticated algorithms about writing code.
Data Scientists Code in R
R is a popular open source programming environment for statistics and data mining. The good news is that it is easily integrated into ML Studio. I have a lot of friends using functional languages for machine learning, such as F#. It's pretty clear, however, that R is dominant in this space.
Polls and surveys of data miners are showing R's popularity has increased substantially in recent years. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors. R is a GNU project and is written primarily in C, Fortran.
Below is a framework that provides a way for you to think about the predictive nature of machine learning. It's all about providing insight to business decisions where limited resources are applied to grow revenue or limit expenses. This might include insights into consumer spending patterns, or to optimizing supply chain.
How to think about the analytics spectrum
One great way to think about machine learning is to break down analytics into 3 questions:
What will happen?
What should I do next?
How to think of the personas doing analytics
The information worker
Typically using a self-service approach using Power BI.
- Power BI for Office 365 is a self-service business intelligence (BI) solution delivered through Excel and Office 365 that provides information workers with data analysis and visualization capabilities to identify deeper business insights about their data
- Involved in data transformation, data warehousing, creating data merchant cubes for analytics, and data modeling
- Work for GM's are directors
- Deeply technical and skilled not just with code, but with mathematics, statistics, and probability
- Can use a variety of techniques to apply probability to predictions (ie, there is a 42% chance that prices will go up in the next 18 hours)
- Like Monte Carlo simulations, parameterizing the model
What to look for in a data scientist
- Domain Knowledge
Clear Understanding Of The Scientific Method
- Objectivity, Hypothesis, Validation, Transparency
Strong in Math and Statistics
Intellectual Curiosity and Critical Thinking
Visualization and Communication
Advanced Computing And Data Management
If you were to go to school, went to study to be a data scientist, what courses would you take?
Industries that really benefit from that of science
This post provided a high-level view of some of the characteristics and concepts with respect to machine learning. In the next post will start playing around with the Azure portal.
Published at DZone with permission of Bruno Terkaly , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.