Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Machine Learning for Everyone!

DZone's Guide to

Machine Learning for Everyone!

Learn the basics of predictive modeling behind the most-used machine learning models: the random forest and decision tree.

· AI Zone
Free Resource

Bring the power of Artificial Intelligence to IT Operations. Brought to you in partnership with BMC.

We all know that machine learning is about handling data, but it also can be seen as the art of finding order in data by browsing its inner information.

Some Background on Predictive Models

There are several types of predictive models. These models usually have several input columns and one target or outcome column, which is the variable to be predicted.

Predictive Model Table

So basically, a model maps between inputs and an output, finding — mysteriously, sometimes — the relationships between the input variables in order to predict any other variable.

As you may notice, it has some commonalities with a human being who reads the environment, processes the information, and performs a certain action.

So What Is This Post About?

It's about becoming familiar with one of the most-used predictive models: random forest (official algorithm site), implemented in R, one of the most-used models due to its simplicity in tuning and robustness across many different types of data.

If you've never done a predictive model before and you want to, this may be a good starting point!

Don't Get Lost in the Forest!

Random Forest Bear

The basic idea behind it is to build hundreds or even thousands of simple and less robust models (decision trees) in order to have a less-biased model.

But how?

Every "tiny" branch of these decision tree models will see just part of the whole data to produce their humble predictions. The final decision produced by the random forest model is the result of votes from all the decision trees — just like democracy.

But what is a decision tree?

You're already familiar with decision tree outputs. They produce "if-then" rules, such as, "If the user has more than five visits, then he or she will probably use the app."

Decision Tree

Putting It All Together...

If a random forest has three trees (but normally 500+) and a new customer arrives, then the prediction whether said customer will buy a certain product will be "yes" if two trees predict so.

Random ForestHaving hundreds of opinions (decision trees) tends to produce a more accurate result on average (random forests).

With this model, you will not be able to easily know how the model comes to assign a high or low probability to each input case. It acts more like a black box, similar to what is used for deep learning with neural networks, where every neuron contributes to the whole.

The next post will contain an example based on real data of how random forests order the customers according to their likelihood of matching certain business condition. Also, it will map around 20 variables into only two so that it can be seen by the analyst!

Next Post

What Language Is Convenient for Learning Machine Learning?

Auth0 mainly uses R software to create predictive models as well as other data processes. For example:

  • Finding relationships between app features, which impacts the engineering area.
  • Finding anomalies or abnormal behavior: which leads to the development of anomaly detection features.
  • Improving web browsing docs based on Markov chains (a user's likelihood of visiting Page B if they are on Page C).
  • Reducing times for answering support tickets using deep learning (not with R but with Keras).

If you want to develop your own data science projects, you should start with R. It has an enormous community from which you can learn (and teach!). It's not always just a matter of complex algorithms but also about having support when things don't go as expected. And this occurs often when you're doing new things.

Lastly, despite the fact that R (and Python with pandas and numpy) has lots of packages, libraries, free books, and free courses, there are more than 160,000 questions on stackoverflow.com and another ~15,000 on stats.stackexchange.com are tagged with R to check out.

--

You can follow me on twitter.

More machine learning stories: Blog.

I invite you to check the open source book I'm writing (open source!): Data Science Live Book

Image title

Cheers!



TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
machine learning ,ai ,predictive models ,predictive analytics ,random forest ,decision trees ,r ,algorithm

Published at DZone with permission of Pablo Casas, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}