A Guide to Data Warehousing Clickstream Data, Part 2
We wrap up this two-part series by examining the different big data analyses/projects that can be tackled with clickstream data.
Join the DZone community and get the full member experience.Join For Free
Table of Contents
- Why clickstream is so important to your online business
- No data science without data
- Understanding customer – key advantage
- Going beyond charts and dashboards
- What is clickstream data?
- Example data output
- You can read Part 1 here.
- Clickstream analysis
- Traffic analysis
- Sales funnel analysis
- Browse/Cart abandonment and recovery
- Tracking Experiments (A/B testing)
- Identity Stitching
The easiest way to utilize clickstream data is to see where a website is getting traffic from. Even though it sounds trivial given so many online tools serve this purpose, but getting true numbers down to the individual visitor level requires owning the clickstream data. We can analyze not just which source brings us most traffic, but also determine:
- Which keywords are most popular.
- See conversion rates from different traffic source visitors.
- Do a cohort analysis.
- Even determine which marketing campaign brought the most traffic. This is possible due to automatic parsing of utm query parameters that are made available in the unified data warehouse. It allows us to track any kind of campaign from paid advertisement to email.
Besides the above, we can extend the tracking to measure email campaign performance of open/click rates. This is especially useful for making your analytics independent of any ESP (email service provider). This makes it easier to migrate from one email provider to another without losing performance data.
Sales Funnel Analysis
Quite often to determine how well our website is working for converting visitors into sales, a sales funnel is used. In this case, we create stages of the customer journey from landing to your website (or app) to paying for a product. Each stage usually has a drop off percentage, which can occur for many reasons. Clickstream data can expose these problems. For example, if a visitor in one product page has a much larger CTR than another, we could investigate the reason for it and try to improve, For example, we could update the content on the page. We’ll see later in the experiment testing how we can test our improvements.
Besides just single stage problems, sales funnelscan serve us as a health metric to quickly determine if conversion starts dropping off at a certain stage. Such problems could mean that parts of our system stopped working and requires quick action. For an online business, where every lost hour can cost thousands of dollars, having this visibility is critical.
Browse/Cart Abandonment and Recovery
Whenever a shopper puts a product to a cart there is a high likelihood that cart will be abandoned. As you can see in the chart below, up to 80% of online customers abandon their shopping carts.
This is quite significant for any online business, especially if some of those abandoned carts can be recovered. To act on this event, we need to have a way to track when a customer has added some items to cart and if after a certain period of time there was no order made. With clickstream data we can capture these events as follows:
SELECT email, mobile, first_name, last_name FROM customer_clickstream WHERE visitor_id IN (select * from cart_abandoners)
One doesn’t need to be a SQL expert to understand what's happening above. We’re just fetching all customers, who are in the cart abandoner segment. Though we need to define what cart abandoner is in a clickstream dataset. We can do that by relying on visitor events:
WITH cart_abandoners AS ( SELECT DISTINCT visitor_id FROM customer_clickstream WHERE event = 'checkout_form_view' AND visitor_id NOT IN ( SELECT visitor_id FROM customer_clickstream WHERE event = 'order_confirmation_view' AND ts_event > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 2 HOUR) ) AND ts_event > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 2 HOUR) )
The script above is a little bit more involved, but it clearly shows how clickstream data can be utilized for segmenting visitors by different actions. What we do is just find all customers that visited the checkout page but haven't viewed the order confirmation page which is shown after purchases have completed. Having this segment we can easily use it either for email or sms campaigns that try to recover a portion of abandoners. Such marketing campaigns are out of scope for this article, but the most obvious action would be sending to these customers a discount voucher code for your products or recommending other products that customer might like.
The nice thing about the above approach is that it can be easily adapted to browse abandonment, meaning when a customer is just browsing product pages but not buying anything. We just need to swap ‘checkout_view’ event with ‘product_view’ and exclude carters/buyers. To make it work, the clickstream data has to be updated fairly frequently, in order for marketing automation to have a better chance of recovering customers until they forget the purchase.
Cart/browse abandoners are just a subset of customers. When we have access to a full clickstream data set we can create segments by any number of parameters, like recency, the average purchase amount, geo location, or specific products that customer has been viewed or bought in the past. With data, the only limit to any segmentation is marketers imagination.
Once a simple analysis is in place, it is possible to utilize clickstream data for more difficult tasks, like improving customer experience. A classical use case is product recommendations.
When a store sells a lot of products, finding the right product can be difficult. The biggest online retailers like Amazon try to find similarities between products and customers to use for recommendations. This helps in two ways: it allows for easier product discovery and tailors customer shopping experiences based on their interests.
Implementing a simple recommendation model using clickstream data is not difficult. What we do is find customers who have viewed certain products and what they bought after. Then we compare all the purchases for a certain product to purchases made to related products. Once we have those values computed for each product, we can rank them and show them on the website once a visitor lands on a product page. In this case, the advantage of owning the data is that we can use any attributes related to the product that might be relevant for recommendations.
The same recommendations can be extended to email or other marketing campaigns without any additional changes to a model's logic or data.
Tracking Experiments (A/B Testing)
The other useful optimization type of analysis is tracking and running A/B experiments. An experiment can help you decide if a particular change has any effect on business relevant KPIs. For example, if we decide to change the design of a certain page, to improve conversion rate. The simplest approach would be to update the design and see if after some time there is any improvement in conversion rate on that page. Though the conversion rate might change over time and comparing it with different historical periods can lead to inaccurate assumptions. The best approach is to run two different designs simultaneously for different visitors and track the outcome of each. Then if the conversion rate improves for one design versus the other, we can be confident that it is really better.
Tracking experiments is not too different than any other events. We just need to record which variation the visitor is viewing. The harder part is to make sure that visitor views only one variation between multiple viewings, otherwise, it might skew the results. To do this we just split our visitors by using their user agent and IP address and serve each either one variation or the other.
The advantage of tracking experiments together with other events is that it makes it easy to compare effects on all visitor behavior for all situations. As an example, we can find out if a new design for mobile visitors works as well as for desktop and how their conversion rate or clickthrough rate differ. Of course, there is no limit what kind of experiments can be run, tracked and analyzed.
Another clickstream data use case which is becoming more relevant in the mobile internet era is being able to stitch customers to a single profile. For example, a customer may open a marketing email on mobile and browse some products, but when it comes to purchasing they might switch to a desktop. In this case, we would want to know if this is the same customer or a different one.
If we track everything with one pipeline, we can find this customer by matching their IP address, assuming that their mobile phone most likely shares the same WiFi connection as their desktop. We can also use other “marks” like cookie IDs, when a customer opens an email we track this with their email address hashcode. If the same customer comes back to the website we can find the same hashcode as well, even when they're using a different device.
The idea behind identity stitching is to ensure that we are matching customers to as many available identifiers as possible in order to be able to have an accurately matching profile. Then a business can tailor a unique customer experience to their profile at all touch points.
Hopefully you now have a better understanding of what clickstream data is, how it can be collected and utilized, and how much it costs for a business.
Of course, data is not a magic wand that will answer all questions, but without it competing with companies that use data to their advantage in today's online market will be more challenging than ever.
The article was originally posted on StackTome blog.
Published at DZone with permission of Evaldas Miliauskas. See the original article here.
Opinions expressed by DZone contributors are their own.