There is more and more data available now that can help inform businesses about their customers, and those businesses that successfully utilize these new sources and quantities of data will be able to provide a superior customer experience. However, predicting customer behavior remains very challenging.
This post is the first in a series where we will go over examples of how Joe Blue, a Data Scientist in MapR Professional Services, assisted MapR customers in identifying new data sources and applying machine learning algorithms in order to better understand their customers. The first example in the series shows an advertising customer 360°; the next blog post in the series will cover banking and healthcare customer 360° examples.
Customer 360° Revolution: Before, During, and After
MapR works with companies who have solved business problems but are limited with what they can do with their data, and they are looking for the next step. Often, they are analyzing structured data in their data warehouse, but they want to be able to do more.
The goal of the customer 360° revolution is to transform your business by figuring out what customers are going to do—if you can predict what your customer is going to do next, then you can add value to your business.
During = Data Scientist
There is a lot of confusion about what a data scientist does. A data scientist is someone who can draw the lines between the before and after, and this involves:
- Identifying new data sources, which traditional analytics or databases are not using due to the format, size, or structure.
- Collecting, correlating, and analyzing data across multiple data sources.
- Knowing and applying the right kind of machine learning algorithms to get value out of the data.
The goal of the customer 360° revolution is generally to:
- Find additional information
- Accentuate business expertise with new learning
- Use new learning to change your business
Use Case: Advertising Customer 360° Before
An example use case for this customer 360°, we'll look at a mobile banking website which displays ads related to what the customer bought with his/her debit card. A “card-to-link” company hosts the ads and determines which ad to display. The advertisers are pharmacies, restaurants, grocery stores, etc., and they want to get their ads to the people who would be interested in buying their products. In order to target ads to interested customers, you need to accurately predict the likelihood that a given ad will be clicked, also known as "click-through rate" (CTR).
The “card-to-link” company was already capable of displaying ads related to purchases, but they wanted to do a better job of targeting and measuring the success of their ads.
As an example, let’s take a look at a new campaign for Chris’s Crème doughnuts:
- Chris’s Crème asks the “card-to-link” company to find people who like doughnuts but who are not already their customers.
- Card-to-link first uses data warehouse debit card merchant information to find Dunkin Doughnut customers and customers of other quick-service restaurants that sell breakfast.
- Card-to-link also uses customer zip code information to only display Chris’s Crème ads to customers who live near a Chris’s Crème location.
This works great, but what if this method increased new customers to 600,000, but Chris’s Crème wanted to pay more to increase the number of new customers to 1,000,000? There is no way to do this with just the existing data warehouse data.
Advertising Customer 360° During Part 1
The first question for the data scientist is: What kind of data does card-to-link have that they could use to augment new Chris’s Crème customers from 600,000 to 1,000,000, and therefore maximize the profit of this campaign?
The New Data
The new data comes from Internet browsing history; it consists of billions of device IDs and the keyword content of the browsing history for that ID.
So how can the data scientist take this new data and find a bigger audience?
Advertising Customer 360° During Part 2
Get the Data on the MapR Platform (NFS)
The first part of the solution workflow is to get the data on the MapR Platform, which is easy via the Network File System (NFS) protocol. Unlike other Hadoop distributions that only allow cluster data import or import as a batch operation, MapR enables you to mount the cluster itself via NFS and the MapR File System enables direct file modification and multiple concurrent reads and writes via POSIX semantics. An NFS-mounted cluster allows easy data ingestion from other machines leveraging standard Linux commands, utilities, applications, and scripts.
The complete customer purchase and past campaign history data is exported from the traditional data warehouse and put on the MapR Platform as Parquet tables. The Internet browsing history is put on the MapR platform as text documents.
Once the browsing, campaign, and purchase history data is on the MapR platform, Apache Drill is used for interactive exploration and preprocessing of the data with a schema-free SQL query engine.
Features are the interesting properties in the data that you can use to make predictions.
Feature engineering is the process of transforming raw data into inputs for a machine learning algorithm. In order to be used in Spark machine learning algorithms, features have to be put into Feature Vectors, which are vectors of numbers representing the value for each feature. To build a classifier model, you extract and test to find the features of interest that most contribute to the classification.
Apache Spark for Text Analytics
The TF-IDF (Term Frequency–Inverse Document Frequency) function in Spark MLlib can be used to convert text words into feature vectors. TF-IDF calculates the most important words in a document compared to a collection of documents. For each word in a collection of documents, it computes:
- Term Frequency is the number of times a word occurs in a specific document.
- Document Frequency is the number of times a word occurs in a collection of documents.
- TF * IDF measures the significance of a word in a document (the word occurs a lot in that document, but is rare in the collection of documents).
For example, if you had a collection of documents about football, then the word concussion in a document would be more relevant for that document than the word football.
Machine Learning and Classification
Classification is a family of supervised machine learning algorithms that identify which category an item belongs to (such as whether a customer likes doughnuts or not), based on labeled data (such as purchase history). Classification takes a set of data with labels and features and learns how to label new records based on that information. In this example, the purchase history is used to label customers who bought doughnuts. The browsing history of millions of text keywords, many of which have seemingly nothing to do with doughnuts, is used as the features to discover similarities and categorize customer segments.
- Label → bought doughnuts → 1 or 0
- Features → browsing history → TF-IDF features
Once the browsing and purchase history data is represented as labeled feature vectors, Spark machine learning classification algorithms such as logistic regression, decision trees, and random forests can be used to return a model representing the learning decision for that data. This model can then be used to make predictions on new data.
After the feature extraction and model building, it is possible to use the model to rank the billions of device IDs, from highest to lowest, who are most likely to eat doughnuts. With that, the audience of potential doughnut eaters can be augmented to the goal of one million.
Advertising Customer 360° After
Each colored line in the graph below refers to the results of modeling with data sets from different categories of advertising campaigns: Quick Serve Restaurant (QSR), Full Serve Restaurant, Light Fare, Auto, Grocery, and Apparel. For each campaign:
1. The purchase history was used to label the population:
- For the grocery campaign, the following modeling populations were created:
- Those who shopped at grocery stores multiple times
- Those who bought their food at other stores
- For the Full Serve restaurant campaign, the following modeling populations were created:
- Yes: had at least 2 visits to a full serve restaurant in the last 2 months (90th percentile)
- No: had at least 18 visits to a quick serve restaurant in last 2 months (90th percentile)
- Excluded those who were in both (small percentage)
2. The browsing history from the labeled populations was used for the features.
3. The classification model was built from the labeled features.
4. The model was used to predict the Click-Through Rate.
The left side of this graph is the percentage of people visiting the banking website who clicked on an advertisement. The right side is the predicted probability that a person would click on the ad. If the colored lines are continuously increasing from left to right, that means the machine learning model worked well at predicting the click-through rates.
For full-service restaurants (the yellow line), the probability of clicking tracks well with the click-through rate. At a low probability, the click-through rate was 30%, and at a high probability the model found populations with a 70% click-through rate. So the model really responded to the method of finding similar customer profiles using their Internet browsing history.
Some campaigns, like grocery, did not work. This shows that the browsing history does not really apply to increased click-through rates for grocery ads, but the click-to-link company did not know this. These are insights that can be used, for example, if click-to-link has a new automotive, QSR, or full-serve campaign. This is a method that works well and can be used for advertisers that will pay to target more customers.
Advertising Customer 360° Summary
In this example, we discussed how data science can use customer behavioral data to find customer groups who share behavioral affinities in order to better target advertisements. For more information review the following resources.
Want to learn more?