Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Calculating Customer Lifetime Value: SQL Example

### Learn about our experience calculating customer lifetime value 592 times with SQL without sophisticated statistical models.

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The Statsbot team estimated LTV 592 times for different clients and business models. We share our experience in this post and in a free ebook on how to calculate customer lifetime value with SQL without sophisticated statistical models.

Customer lifetime value, or LTV, is the amount of money that a customer will spend with your business in their “lifetime,” or at least, in the portion of it that they spend in a relationship with you. It’s an important indicator of how much you can spend on acquiring new customers. For example, your customer acquisition cost (CAC) is \$150 and LTV is \$600. You would be able to increase the budget to get more people and grow your business. The balance between CAC and LTV allows you to check any business for market survival.

Estimating LTV is a predictive metric that depends on future purchases based on past patterns, and allows you to see how much risk you are exposed to as a business and how much you can afford to spend to acquire new clients. At an individual level, it also enables you to figure out who your highest-value customers are likely to be — not just now but also in the future.

In order to understand how to estimate LTV, it is useful to first think about evaluating a customer’s lifetime value at the end of their relationship with us. So, say a customer stays with us for 12 months and spends \$50 per month.

The revenue that they generated for our business over their lifetime is then \$50*12 = \$600. Simple! We can consider the basic definition of LTV as a sum of payments from a specific user.

This same principle applies to the group. If we want to see the average LTV for the group, we can look at total spend divided by the number of customers. When we’re talking about estimating LTV for the group, or predictive LTV, we need to take into account how long customers stay with us. To get that, we actually look at it “backward:” we look at how many customers we lose over time, or the churn rate.

## How Do You Calculate LTV for SaaS?

At a group level, the basic formula for estimating LTV is this:

Where ARPU is average monthly recurring revenue per user and the churn rate is the rate at which we are losing customers (so the inverse of retention).

This basic formula can be obtained from the assumption:

Next Month Revenue = (Current Month Revenue) * (1 - Churn Rate)

Note: When we’re estimating customer lifetime value for SaaS, we can neglect Gross Margin because costs are minor and don’t affect the accuracy of a result. But when we calculate predictive LTV for e-commerce later in this article, we’ll include COGS in our formula.

The main limitation of the LTV formula above is that it assumes that churn is linear over time, as in, we are as likely to lose a customer between the first month of membership of our service and the second as we are to lose them much later on. Going deeper into the nature of predictive LTV, we can say that it’s a sum of a geometric series, and linear churn doesn’t look like a straight line (as is shown in many articles about LTV).

In fact, we know that linear churn is usually not the case.

In a flexible subscription model, we lose many people at the very beginning when they are “testing out” a service, but once they have been with us for a long time, they are less likely to leave.

Ultimately, it depends on the type of contract that exists between customers and the business: for example, annual renewals, where churn is more linear, will result in LTV that is very close to the formula above.

Services that do not have any contracts may lose a high percentage of their new customers, but then, churn may slow down.

We can think of this concept graphically:

If the LTV of the group is the area under the line, we can very clearly see that the rate at which we lose customers will impact our LTV estimates vary significantly. So, we will need to take this into account when we are making our calculations. For the first estimate of LTV, however, it makes sense to go with the simplest formula. After that, we will add levels of complexity.

## Extracting ARPU and Churn Using SQL

You can calculate ARPU in two steps:

``````month_ARPU AS
(SELECT
visit_month,
Avg(revenue) AS ARPU
FROM
(SELECT
Cust_id,
Datediff(MONTH, ‘2010-01-01’, transaction_date) AS visit_month,
Sum(transaction_size) AS revenue
FROM   transactions
WHERE  transaction_date > Dateadd(‘year’, -1, CURRENT_DATE)
GROUP BY
1,
2)
GROUP BY 1)``````

The results will look like this:

In the case above, that would give us an average monthly spend of \$987.33.

Calculating churn rate is a bit more complicated, as we need the percentage of people not returning from one month to the next, taking each group of customers according to the month of their first visit, and then checking if they came back or not in the following month.

The problem is always that, in a transactional database, we have customers’ visits on separate lines, rather than all on the same line.

The way to fix that problem is to join the transactional database to itself so that we can see a customer’s behavior on one single line.

In order to isolate those who churned, we take the visits from month 1 and left join the visits from month 2 on the cust_id. The lines where visits from month 2 have a cust_id that is null are the ones where the customer has not returned.

``````WITH monthly_visits AS
(SELECT
DISTINCT
Datediff(month, ‘2010-01-01’, transaction_date) AS visit_month,
cust_id
FROM            transactions
WHERE

(SELECT
avg(churn_rate)
FROM
(SELECT
current_month,
Count(CASE
WHEN cust_type='churn' THEN 1
ELSE NULL
END)/count(cust_id) AS churn_rate
FROM
(SELECT
past_month.visit_month + interval ‘1 month’ AS current_month,
past_month.cust_id,
CASE
WHEN this_month.cust_id IS NULL THEN 'churn'
ELSE 'retained'
END AS cust_type
FROM
monthly_visits past_month
LEFT JOIN monthly_visits this_month ON
this_month.cust_id=past_month.cust_id
AND this_month.visit_month=past_month.visit_month + interval ‘1 month’
)data
GROUP BY 1)
)``````

Say this gives us a result of 0.1, just for simplicity.

It is a simple calculation, then, to estimate LTV: we have monthly ARPU and monthly churn, so we just divide one by the other!

\$987.33/0.1 = \$9873.3

As stated earlier, there are limits to this formula, mostly because it makes a series of assumptions that may not hold in the real world. The main one is that retention and churn rates are stable both across cohorts and across time.

Stability across cohorts implies that early adopters of your service act in similar ways to late adopters, while stability across time implies that customers’ likelihood of churning out is the same at the beginning of their relationship with you as it is, for example, two years in. Depending on how close to the truth these assumptions are, you may need to revise your LTV estimate downwards.

If you want to learn how to estimate LTV for e-commerce, for cohorts, and each individual customer, download our free ebook on calculating customer lifetime value with SQL.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
sql ,business metrics ,data analytics ,big data ,tutorial

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.