This post is the second in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers to identify new data sources and apply machine learning algorithms in order to better understand their customers. If you have not already read the first part of this customer 360° series, then it would be good to do so. In this second part, we will cover a bank customer profitability 360° example, presenting the before, during, and after.
Bank Customer Profitability: Initial State
The back story: A regional bank wanted to gain insights about what’s important to their customers based on their activity with the bank. They wanted to establish a digital profile via a customer 360-degree solution in order to enhance the customer experience, to tailor products, and to make sure customers had the right product for their banking style.
As you probably know, profit is equal to revenue minus cost. Customer profitability is the profit the firm makes from serving a customer or customer group, which is the difference between the revenues and the costs associated with the customer in a specified period.
Banks have a combination of fixed and variable costs. For example, a building is a fixed cost, but how often a person uses an ATM is a variable cost. The bank wanted to understand the link between their product offerings, customer behavior, and customer attitudes toward the bank in order to identify growth opportunities.
Bank Customer Profitability: During
The bank had a lot of different sources of customer data:
- Data warehouse account information.
- Debit card purchase information.
- Loan information such as what kind of loan and how long it has been open.
- Online transaction information such as who they’re paying and how often they’re online (or if they go online at all).
- Survey data.
Analyzing Data Across Multiple Data Sources
This data was isolated in silos, making it difficult to understand the relationships between the bank’s products and the customer’s activities. The first part of the solution workflow is to get the data into the MapR Platform, which is easy since MapR enables you to mount the cluster itself via NFS. Once all of the data is on the MapR Platform, the data relationships can be explored interactively using the Apache Drill schema-free SQL query engine.
A key data source was a survey that the bank had conducted in order to segment their customers based on their attitudes toward the bank. The survey asked questions like, “Are you embracing new technologies?” and “Are you trying to save?” The responses were then analyzed by a third party in order to define four customer personas. However, the usefulness of this survey was limited because it was only performed on 2% of the customers. The question for the data scientist was, “How do I take this survey data and segment the other 98% of the customers, in order to make the customer experience with the bank more profitable?”
Feature engineering is the process of transforming raw data into inputs for a machine learning algorithm. With data science, you often hear about the algorithms that are used, but actually a bigger part — consuming about 80% of a data scientist’s time — is taking the raw data and combining it in a way that is most predictive.
The goal was to find interesting properties in the bank’s data that could be used to segment the customers into groups based on their activities. A key part of finding the interesting properties in the data was working with the bank’s domain experts, because they know their customers better than anyone.
Apache Drill was used to extract features such as:
- What kind of loans, mix of accounts, and mix of loans does the customer have?
- How often does the customer use a debit card?
- How often does the customer go online?
- What does the customer buy? How much do they spend? Where do they shop?
- How often does the customer go into the branches?
Accentuate Business Expertise With New Learning
After the behavior of the customers was extracted, it was possible to link these features by customer ID with the labeled survey data, in order to perform machine learning.
The statistical computing language R was used to build segmentation models using many machine learning classification techniques. The result was four independent ensembles, each predicting the likelihood of belonging to one persona, based on their banking activity.
The customer segments were merged and tested with the labeled survey data, allowing to link the survey “customer attitude” personas with their banking actions and provide insights.
Banking Customer 360: After
The solution results of modeling with the customer data are displayed in the Customer Products heat map below. Each column is an “attitude”-based persona and each row is a type of bank account or loan. Green indicates the persona is more likely to have this product, and red indicates less likely.
This graph helps define these personas by what kinds of products they like or don’t like.
This can give insight into:
- How to price some products.
- How to generate fees.
- Gateway products, which allow going from a less profitable customer segment to a more profitable one.
In the Customer Payees heat map below, the rows are electronic payees. This heat map shows where customer personas are spending their money, which can give channels for targeting and attracting a certain persona to grow your business.
In this graph, the bright green blocks show that Fitness club A, Credit card/Bank A, and Credit card/Bank C are really strong for Persona D. Persona A is almost the opposite of the other personas. This “customer payees” heat map gives a strong signal about persona behavior. It’s hard to find signals like this, but they provide an additional way to look at customer data in a way that the bank could not conceive of before.
Bank Customer 360-Degree Summary
In this example, we discussed how data science can link customer behavioral data with a small sample survey to find customer groups who share behavioral affinities, which can be used to better identify a customer base for growth. The bank now has the ability to project growth rates based on the transition between personas over time and find direct channels that allow them to target personas through marketing channels.