Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using Google Prediction API in .NET

DZone's Guide to

Using Google Prediction API in .NET

·
Free Resource
Those readers who are not normal data-in/data-out software developers would have some information about different aspects of computers and techniques involved in problem solving in this area. One of the very powerful methods of solving problems and automating human lives is Artificial Intelligence, particularly Machine Learning techniques. A very common and famous part of machine learning is the topic of predicting the future behavior of a system based on the past data that is usually fed into it in a training phase. One good example is the Bayesian techniques in spam filtering where we use some sample data to train the filter and use it to distinguish incoming content as legitimate or spam.

As you can guess, there are many applications of these techniques in real world. The other application that you may have used in some sites without even noticing is the recommendation systems. As an example, one of my friends in Iran, who’s going to earn his M.Sc. degree in Artificial Intelligence, has recently launched a service that lets users see a set of news items fetched from various resources. Users click on the news items and after a while, system learns about their interests and displays items that are more likely to be viewed by them.

The implementation of a predictor with such techniques requires knowledge and background that is not a part of common knowledge for many programmers and takes time, but the need to implement such features in software is growing and they improve the quality and usefulness of many programs. This has been a motivation for launching a service by Google called Google Prediction API which allows programmers to pass their data to a centralized server managed by Google and ask them to decide about it.

Google Prediction API exposes a public RESTful API that can be used to connect to it. Luckily, there is a good implementation of Google Prediction API for .NET available as an open source project on CodePlex that you can download which includes a client application to test the API. In this post I’m going to give a quick overview of this Google Prediction API service based on this .NET library.

Prerequisites

In order for you to be able to use Google Prediction API, you need to have a Google account as well as Google APIs Console project that has two APIs activated/enabled:

  • Google Storage API: This is where you put your data specifically your sample training data.
  • Google Prediction API: The main service you use to predict different characteristics of current data based on the past.

Google offers up to 100 API queries per day for Google Prediction API for free and you have to use one of the commercial options if you need to exceed this limit. Note that Google Storage API can’t be used unless you have billing enabled in your Google account. Although Google Storage is fully commercial, you can use a trial version by the end 2011 for free. Google Prediction API assumes that you’ve put your data on Google Storage on a bucket manually (through the web interface or one of the explorer tools) or automatically (using the API). In this post I also assume that I’ve put a sample data.csv file in a bucket called keyvan.

Besides, if you want to use the .NET library, you need to download two assemblies for the Google Prediction API for .NET and Json.NET. Both these files can be downloaded from the CodePlex workspace of Google Prediction API for .NET.

Overview

There are three main steps necessary to use Google Prediction API:

  • Authorization: You authorize your user.
  • Training: You send a set of data to train your system with. The quality, amount, and relevance of your training data has a huge impact on the precision of your predictions.
  • Prediction: Based on the trained data, you can ask Google to predict a characteristic of a new item, and this prediction can be based on textual, numeric, or mixed data.

I try to over these steps in the next few sections with an example that is a prototype of a very simple spam filter.

Authorization

The authorization step can be done in two ways: using Google username and password or using an authorization key. You need to use the first approach at least once in order to retrieve an authorization key to be used later and this is the recommended approach since your Google username/password is most likely used with many other services and your mailbox, so you don’t want to release such information for using a service API.

You can authorize yourself by creating a GooglePredictionClient object and providing the required information for the constructor. Besides the username and password, you also need to provide a bucket name and training data file name that I will describe later.

// Authorization
GooglePredictionClient client = new GooglePredictionClient("username@gmail.com", "password", "bucket", "data.csv");

This automatically authorizes you to use the service. You can retrieve your authorization key that you can use later to pass to another constructor of GooglePredictionClient by using the Auth property of ClientLogin object.

// Retrieving the authorization key
Console.WriteLine(string.Format("Your authorization key is:\n{0}", client.ClientLogin.Auth));

Training

You need to upload your training data to Google Storage using either an existing online or offline tool or the API for this purpose. This is beyond the scope of this post but the simplest way is to use the online explorer that Google provides.

The training data has a comma-separated value format described by Google, and I use a simple training file for spam filtering.

The training step is also easy using the library for .NET. All you need to do is to call the Train method on GooglePredictionClient object to train the system.

// Training            
TrainingResponse trainingResponse = client.Train();
if (trainingResponse.Success)
    Console.WriteLine("Training succeeded.");
else
    Console.WriteLine(string.Format("Training faild: {0}.", trainingResponse.ResponseStatusCodeDescription));

You can also delete your current training model by calling the Delete method.

// Deleting the training model
DeleteResponse deleteResponse = client.Delete();
if (deleteResponse.Success)
    Console.WriteLine("Delete succeeded.");
else
    Console.WriteLine(string.Format("Delete faild: {0}.", deleteResponse.ResponseStatusCodeDescription));

Of course, it’s also possible to change the training data and retrain your model using ChangeData method.

// Changing the training model
client.ChangeData("keyvan", "data.csv");
trainingResponse = client.Train();
if (trainingResponse.Success)
    Console.WriteLine("Training succeeded.");
else
    Console.WriteLine(string.Format("Training faild: {0}.", trainingResponse.ResponseStatusCodeDescription));

It’d also possible to check the status of training (as it may take a while to be done) by calling the Check function.

// Checking the status of training
MetadataResponse metaResponse = client.Check();
Console.WriteLine(metaResponse.ModelInfo);

Prediction

Obviously, the main purpose of the whole process is to predict something. Google Prediction API provides mechanisms to predict based on numeric or textual data as well as a combination of both. Google Prediction API for .NET provides different methods for different numerical types supported in the .NET Framework as well as appropriate methods for textual and mixed data predictions.

You can create an instance of generic NumericPrediction (by passing the appropriate numeric type), TextPrediction, or MixedPrediction objects to pass to PredictNumeric, PredictText, or PredictMixed functions, respectively. These objects receive an array of features as their input, and the result of the prediction functions is an instance of PredictionResponse object.

You can get the general output label using the OutputLabel property of PredictionResponse while you can also iterate through OutputMulti property to have access to individual MultiOutput items with their own output labels and prediction scores. In the case of spam filtering these scores can be used in conjunction with a threshold to detect spam items.

// Prediction
TextPrediction sample = new TextPrediction("Some bad words common in spamming!");
            
PredictionResponse prediction = client.PredictText(sample);

Console.WriteLine(string.Format("The output label is {0}.", prediction.OutputLabel));
foreach (MultiOutput output in prediction.OutputMulti)
{
    Console.WriteLine(string.Format("Output label {0} with score {1}.", output.Label, output.Score));
}

Having my authorization key in the output as well as a set of bad words that I used for testing, I’m not going to provide the output screenshot, but you can download and test this code yourself.

Conclusion

Google Prediction API can be a very helpful service for programmers but in my opinion there are some concerns and problems with it. First, it offers a very low number of queries per day for free. 100 queries is not even enough for spam filtering on a very small and young blog. Second, it works with Google Storage that is totally commercial which also makes it harder for individuals and non-profit organizations to use the service effectively. Third, the whole idea of authorization using the username and password is not very appealing because these credentials are used on several services specifically on GMail and Google Docs where users may have very critical information. Even though the authorization key is provided as an option but at least for once the user is forced to use his username/password on a client application or library that is not verified. One simple solution can be to provide a web user interface by Google online to generate such keys for users.

Having these said, this service can be used effectively with many applications in different programs. One very simple application is to build a personalized spam filter for sites and blogs that is not offered by centralized services like Akismet due to the huge amount of data that they will need to handle and store for individual blogs. The other application is to implement a recommendation system on sites based on the past behavior of users.

If you’re interested to run the code yourself, I’ve uploaded the sample code package for this post.

Topics:

Published at DZone with permission of Keyvan Nayyeri. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}