{{announcement.body}}
{{announcement.title}}

Machine Learning: Supervised Styling

DZone 's Guide to

Machine Learning: Supervised Styling

There are currently 3 ways to educate a computer. The first one is called Supervised Learning, and that’s what we’ll be talking about today.

· AI Zone ·
Free Resource

Image title

Learning!

Apparently, computers can do it now. Kind of. Well, not really. In fact, a 4-year-old can still learn much more efficiently and with a fraction of the input. Even with an orangutan for a teacher. But let’s not get political my pretties. Shame on you.

There are currently 3 ways to educate a computer. The first one is called Supervised Learning, and that’s what we’ll be talking about today.

What Is It?

Supervised Learning involves taking a set of data, splitting it up (randomly) into a test set and a teaching set. The teaching set is then fed into an algorithm, which attempts to extrapolate enough data to then be able to predict what’s in the teaching set. If it comes out with a decent divination, then the algorithm is solid. This is why it’s called supervised; because we’re essentially showing the algorithm a "correct" outcome before feeding it unknown stuff.

I won’t get into the bits and pieces of how to do this just yet, I’ll work on that for another article, but the long and short is that the scikit-learn library makes all of this super easy to do. For example, let’s just look at splitting up the data (step 1).

Assume that you’ve imported some US housing data and you want to train your algorithm to predict the price of a house based on several other variables. First, split the dataset into two other sets, one with your prediction variable of choice (y) and the other with everything else (X)

X = USAhousing[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms', 'Avg. Area Number of Bedrooms', 'Area Population']]
y = USAhousing['Price']

Now you just need to import one other library:

from sklearn.model_selection import train_test_split

Then, apply the train_test_split function to your 2 datasets to produce 4 datasets, randomly chosen to be either test data or training data:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)

Once you have this, you can begin to teach your computer. But we’ll continue with that in another article. Stay tuned!

Feel free to leave any feedback or questions in the comments section.

mattdata.com

Topics:
artificial intelligence, data science blog, deep learning, machine learning, scikit-learn, supervised learning

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}