Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Make Smart Predictions With Amazon Machine Learning

DZone's Guide to

Make Smart Predictions With Amazon Machine Learning

Get a crash course in the basic concepts of machine learning, learn to start with Amazon ML and its API, and get pointers on how to apply machine learning to your IoT app.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

The Temboo Choreo library just got a new addition: Amazon's own Machine Learning service. It's an excellent way to get started with data-driven predictions in any application without bringing on a machine learning specialist. If you're looking for a straightforward supervised machine learning solution and your application doesn't call for a custom implementation, Amazon Machine Learning may be just the tool you're looking for. It's a version of one of the machine learning implementations that Amazon itself uses internally, so scalability is certainly a feature.

In this article, we'll give you a crash course in the basic concepts of machine learning, tell you where to start with Amazon ML and its API, and give you a few pointers on how you might approach applying machine learning to your Internet of Things application.

Amazon Machine Learning is designed for supervised machine learning tasks. Not so sure what we mean by that? No worries — just read on.

Putting Machine Learning Into Context

Houses

Here's an example of just one of the infinite tasks that can be done with machine learning.

Let's say you make your living selling houses and you want to make pricing the houses you sell an easier task. You might decide to train a machine learning model to predict the price at which you should sell the house. To do so, you collect all the data you can find about houses sold in your area in the last five years or so.

Real estate map from Zillow.com

Housing data on Zillow

Your dataset includes several data points describing each house's features, like its area in square feet and the number of bedrooms it has. For each house, you also include the price at which it was sold. You make sure your data is clean and ready for processing and then you feed it into your machine learning model to "teach" it how to think about house prices. Then, you test your model out on some other house data whose price you already know. How well it guesses their actual sale prices lets you know how representative your initial dataset was and thus how helpful your machine learning model will be.

If your model is effective, you can give it data for a house you're hoping to sell and then get a good prediction for the price you should sell it for. What you do with that prediction is up to you.

Machine Learning Fundamentals

The underlying basis of a machine learning application is a statistical model that improves itself as it is given more data. In Google AI researcher François Chollet's book Deep Learning With Python, he describes the distinction between classical programming and machine learning as follows:

Classical programming uses rules and data to produce answers. Machine learning, on the other hand, uses data and answers to produce rules.

It's important to understand that not every problem is a good candidate for machine learning. Many of the questions you may have for your dataset are perfectly answerable using classic analytical methods. So when is machine learning a good choice to solve the task at hand? Amazon Machine Learning's documentation helpfully identifies the following two simple rules for deciding whether ML is the right choice:

  1. If you can't explicitly program the behavior you want because it's unclear exactly what the logical rules to produce that behavior are, as lots of variables influence the outcome.

  2. If it's unreasonable to perform a task at scale even if it's simple enough to perform on an individual basis.

Types of Machine Learning

The major types of machine learning are characterized by the kind of feedback given to the machine learning algorithm during its "learning" phase. Though Amazon Machine Learning is for performing supervised machine learning tasks, it's useful to have a basic understanding of the capabilities of each of the following major types:

  • Unsupervised. This kind of machine learning model finds the patterns in the data even though it is given no human help in interpreting the data.

    • In context: An example of an unsupervised ML task is clustering, in which the ML model determines which items in the datasets are related to each other. An example of clustering might be grouping similar news articles together without any humans needing to read and tag them with metadata.

  • Reinforcement. This kind of machine learning makes attempts to solve a problem or make a prediction and is told by a human teacher whether the result is correct.

    • In context: Reinforcement learning is used for many things, including teaching an ML model how to successfully play a video game or teaching a robot hand how to pick something up.

  • Supervised. This kind of machine learning is given a large volume of complete sets of data, after which it is asked to try to predict the value of a missing data point inside an incomplete set of data. This is what we can use Amazon Machine Learning to do.

    • In context: Our house pricing example given at the beginning of the article is a supervised machine learning task. Other examples include an ML model trained to detect whether a given email is spam. The training data for a model to perform such a task would be made up of many examples of emails that a human has identified and labeled as being either spam or not spam. That dataset should contain both spam and non-spam examples.

Terms to Know

  • Prediction. In supervised machine learning, we talk about our ML algorithm making predictions — that is, it makes its best guess at an answer based on the rules it has learned from the example data we gave it. Note that the use of the word "predict" in this context does not necessarily imply that the target value is something about the future. Rather, it is simply an unknown value for which we want a quality guess, no matter when it came about or will come about.

    • In context: In our house pricing example, we're getting predictions for the sale price of a house.

  • Model. The machine learning model is the heart of a machine learning application. It's the statistical algorithm that we "teach" to make predictions.

  • Feature. Features are the data points that we already know. We will ask our machine learning model to make a prediction based on these known data points.

    • In context: In our house pricing example, relevant features might include the number of bedrooms and the year of the house's construction.

  • Target. In supervised machine learning, the value we are asking our ML model to predict is called the target value or simply the target.

    • In context: In our housing example, the target of our predictions is the sale price of the house.

  • Labeled data. Labeled data is a dataset made up of complete observation data, i.e. a set of features that includes the correct corresponding target value.

    • In context: Labeled data for our housing example might look something like this if we were to format it as a table:

      Area (sqft)

      Floors

      BR

      BA

      Yard (sqft)

      Garage

      Pool

      School District

      Exterior

      Roof Material

      Year Built

      Year Sold

      Price

      1601

      1

      3

      2

      11902

      y

      n

      109

      Wood

      Asphalt

      1974

      2008

      145111

      1840

      1

      3

      2

      128840

      n

      y

      108

      Brick

      Asphalt

      1946

      2017

      220000


  • Training. This is the learning component of machine learning. In this part of the process, we help the model understand the relationships between the features we already know and the target value that we will ask it to predict based on that data. This is done by giving it a lot of labeled data.

  • Classification vs. regression. In supervised learning, there are two types of predictions that might be made. If your target value is a numerical value, that type of prediction is known as regression task.

    • In context: If we want our house pricing model to tell us the number of thousands of dollars a given house might sell for, then our prediction task is a regression task.

    Linear regression as a constellation

    On the other hand, if you would like to predict the value of a target that can only be one of a finite set of values, that's known as a classification task. There are two types of classification tasks: binary and multiclass classification. Binary classification tasks predict a target that has two possible values and multiclass classification tasks predict a target that has three or more potential target values.

    • In context: If for some reason, we only need to know whether the given house would sell for more or less than $300,000, then that's a binary classification task. If we had bucketed the house prices into sub-ranges of something like, less than $200K, $200-$400k, $400k-$700k, and more than $700k, and we only wanted predictions about which of those ranges the given house would fall into, that would be a multiclass classification task.

  • Fit. The accuracy or usefulness of a machine learning model is known as the model's fit. Learn more about model fit in the Amazon ML documentation.

The Supervised Machine Learning Process With Amazon ML

The quickest way to understand the Amazon ML process is hands-on with their interactive tutorial project, but we'll give you a quick rundown and show which Temboo Choreos you can use at each step along the way.

1. Prepare a Training Dataset

The majority of your effort in performing any type of machine learning task will be spent on the most important step of all: preparing a dataset for training your machine learning model. Collecting and cleaning the data requires care, and the time you take to plan ahead will have a strong influence on the worth of the machine learning model that results from training with this data.

Before you do anything else, you should determine what question you're asking of your data. Consider whether you really need a numerical value as an answer or whether your problem is actually a classification task.

The principle of garbage in, garbage out applies to machine learning as much as it does to anything that relies on statistical methods or data processing. It's important to make sure that your training data is made up of observations consisting of data points that are relevant to your target. It's possible that not all of the data you have available will have meaningful features for your eventual training dataset. Giving your data a thorough check-up with traditional analytical methods will help you determine which features should stay in your training data and which ones should get thrown out. Here's what to look for when analyzing your potential training data.

You may find it necessary to perform some preliminary computations on your initial feature values in order to turn them into useful values for ML model training purposes.

In context: It's best to first convert raw sensor data to a human-readable unit — for example, converting temperature sensor data from raw values into degrees Fahrenheit or Celsius.

Converting a feature with continuous values into a finite number of discrete ranges of values may be a better choice than using a numerical value.

In context: In the example data given above for the house dataset, school districts are represented by numbers. School districts, however, are not in any way related to the value of the number used to label them. This number can be thought of as a category or name rather than a numerical value and should be treated as such.

Read what Amazon ML has to say about collecting labeled data to train your machine learning model and see the section on feature processing for more details on ensuring your data set is fit for training a model.

2. Train the Machine Learning Model

Once your data is collected, cleaned, and properly formatted, it's time to upload it to your AWS S3 bucket and create a new data source from the Amazon Machine Learning console.

There are Temboo Choreos for every step of the training process, but unless you're regularly creating multiple machine learning models, it's probably more efficient to take care of these steps in the ML Console than to do it programmatically. Here are the Choreos you would need:

  • CreateDataSourceFromS3: Creates a DataSource object.
  • CreateMLModel: Creates a new MLModel using the DataSource and the recipe as information sources.
  • DescribeMLModels: Returns a list of MLModels that match the search criteria in the request.
  • GetMLModel: Returns an MLModel that includes detailed metadata, data source information, and the current status of the MLModel.
  • UpdateMLModel: Updates the MLModelName and the ScoreThreshold of an MLModel.
  • AddTags: Adds one or more tags to an object, up to a limit of 10.
  • DescribeTags: Describes one or more of the tags for your Amazon ML object.
  • DeleteTags: Deletes the specified tags associated with an ML object.
  • DeleteMLModel: Assigns the DELETED status to an MLModel, rendering it unusable.
  • DeleteDataSource: Assigns the DELETED status to a DataSource, rendering it unusable.

Temboo also has Choreos for AWS S3, which can come in handy when uploading new datasets for creating data sources.

3. Evaluate the Accuracy of the Machine Learning Model

When training a machine learning model in supervised machine learning, we set aside a portion of the training dataset for testing the model after we have trained it. This way, we can get an approximate idea of whether our machine learning model turned out to be accurate after training.

Amazon ML does this for you automatically and provides simple utilities for evaluating your ML model in the future, should you have new labeled data available to use for the evaluation.

If you would like to regularly and programmatically evaluate your ML model using a new labeled dataset, you'll want to use these Choreos:

4. Use the Model to Make Predictions

Now, you're ready to benefit from all your hard work and generate predictions from your data. In Amazon ML, there are two ways to make predictions: you can make individual predictions in real-time or you can make batch predictions for multiple observations all at once. You should use the method that's appropriate for the level of urgency for your application in accessing those predictions. The difference between the two is that batch predictions can handle multiple rows of observation data and take more time to produce.

The dataset you'll send to Amazon ML to generate predictions should look exactly the same as your training dataset, with one exception: it won't include the target value. For batch predictions, you'll need to upload a properly formatted CSV file containing rows of observations to AWS S3; then, you'll create an Amazon ML data source from that file. For real-time predictions, you can either manually enter observation data in the Amazon ML console or you can use the Amazon ML API. When using the API, you'll need to build a JSON string containing your set of observations. It may take a bit of experimentation to get your JSON string properly formatted to match the schema of your training dataset.

Generating code to predict with Amazon ML

Here are the Choreos you might need once you're using your model to make predictions:

  • Temboo's AWS S3 Choreos: Everything you need for S3 file management
  • CreateDataSourceFromS3: Creates a DataSource object.
  • CreateRealtimeEndpoint: Creates a real-time endpoint for the ML model. The endpoint contains the URI of the ML model, which is the location to which to send real-time prediction requests for the specified ML model.
  • Predict: Generates a prediction for the observation using the specified ML model.
  • CreateBatchPrediction: Generates predictions for a group of observations.
  • UpdateBatchPrediction: Updates the BatchPredictionName of a BatchPrediction.
  • AddTags: Adds one or more tags to an object, up to a limit of 10.
  • DescribeTags: Describes one or more of the tags for your Amazon ML object.
  • DeleteTags: Deletes the specified tags associated with an ML object.
  • DeleteBatchPrediction: Assigns the DELETED status to a BatchPrediction, rendering it unusable.
  • DeleteDataSource: Assigns the DELETED status to a DataSource, rendering it unusable.
  • DeleteRealtimeEndpoint: Deletes a real-time endpoint of an MLModel.

Machine Learning for IoT

Factory equipment

Using supervised machine learning in conjunction with the Internet of Things presents some exciting possibilities and interesting challenges. IoT has tremendous potential to generate vast amounts of data — and the opportunity is ripe for machine learning to play an impactful role.

The first challenge of machine learning for IoT is the relatively limited amount of computational resources found on embedded devices. This is the very reason that the cloud and edge device model has become so prevalent: we're offloading intensive processing to a more powerful, more central computer. It's the perfect context for a machine learning cloud service.

Perhaps the greatest hurdle for any ML application, IoT or otherwise, is collecting a meaningful and representative training dataset. Understand that this step may be very time-consuming, depending on the nature of the data you're collecting.

To build your initial training dataset, consider that depending on what your model will be predicting, you may need to collect data points from multiple devices — perhaps in disparate physical locations. You may also consider gathering some of your data from third-party sources, such as local weather data from a weather service API, or generating natural language metrics using a service, such as the Google Cloud Natural Language API.

Depending on the number of sources that pool to create your dataset, you may need to gather all of your data points in a central location to then properly format and send to AWS. It's up to you whether you do that locally on your own server or a gateway device, or you can choose to do it through a cloud services database API or something as simple as Google Sheets.

Applications of machine learning in IoT include predictive maintenance and optimizing equipment performance and system efficiency. For example, to understand factory equipment, you might begin gathering data about energy expenditure, vibration, temperature, product scrap rates, and machinery malfunctions. That's just the beginning. Anywhere that you place devices to monitor physical conditions could have potential as a machine learning data source.

Further Reading on Machine Learning

Without a doubt, Amazon Machine Learning's documentation is among its best features. With thorough and accessible explanations of what you need to know about how machine learning works, as well as guides on every step of the process and an interactive tutorial project, you'll be prepared to make the most of Amazon Machine Learning like a pro.

The beauty of Amazon Machine Learning is that you really don't need to know all of the ins and outs of machine learning in order to use it, but it is a fascinating field and there are many excellent free resources out there to understand it better. Here are just a few of them:

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
machine learning ,ai ,tutorial ,predictive analytics ,supervised machine learning ,amazon lachine learning ,api ,iot application

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}