DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Event Driven Architecture (EDA) - Optimizer or Complicator
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • Integrating Google BigQuery With Amazon SageMaker

Trending

  • Data Lake vs. Warehouse vs. Lakehouse vs. Mart: Choosing the Right Architecture for Your Business
  • AI-Driven Root Cause Analysis in SRE: Enhancing Incident Resolution
  • Cloud Security and Privacy: Best Practices to Mitigate the Risks
  • Building Reliable LLM-Powered Microservices With Kubernetes on AWS
  1. DZone
  2. Data Engineering
  3. Big Data
  4. A Guide to Data Warehousing Clickstream Data, Part 1

A Guide to Data Warehousing Clickstream Data, Part 1

In Part 1 of this two-part series, we take a look at the benefits that collection and analyzing clickstream data brings.

By 
Evaldas Miliauskas user avatar
Evaldas Miliauskas
·
May. 29, 19 · Analysis
Likes (4)
Comment
Save
Tweet
Share
16.2K Views

Join the DZone community and get the full member experience.

Join For Free

Table of Contents

Part 1

  • Why clickstream is so important to your online business
    • No data science without data
    • Understanding customer – key advantage
    • Going beyond charts and dashboards
  • What is clickstream data?
  • Example data output

Part 2

  • Clickstream analysis
    • Traffic analysis
    • Sales funnel analysis
    • Browse/Cart abandonment and recovery
    • Personalization
    • Tracking Experiments (A/B testing)
    • Identity Stitching
  • Conclusion

Why Cickstream Is So Important to Your Online Business

Clickstream data allows you to see what actions customers are taking on your website. Given how commerce is shifting more and more online, this data is becoming essential for your business to stay competitive. Before defining what kind of data this is, let's take a look at the main reasons why a business needs to own it in the first place.

No Data Science Without Data

The first reason why you should collect and own clickstream data is to be able to take advantage of data science. Unfortunately, as the name implies, data comes first before any science can be made and without it even the most sophisticated models won’t work. This is why you would want to pursue strategic data acquisition, which will make your business more defensible in the long run.

Understanding the Customer - A Key Advantage

Often, clickstream is associated with web analytics, due to its being able to analyze your customer's behavior. For example, you can find out how many customers drop off during the process that takes you from the landing page to completing the purchase. The advantage of owning such data is that you can filter by any trackable metric down to the individual visitor level without limitations of reporting dashboards that are provided by web analytics tools.

Also, you are free to combine reports with any other data source at your disposal. For example, one can stitch orders, paid advertisement reports, geo, and other sources which increases the utility your data assets. Of course, this is possible only when you have full access to the collected data set, and it's available in one unified location.

Going Beyond Charts and Dashboards

Tracking KPIs with charts and dashboards is helpful for monitoring business health and detecting problems in real-time. Though this is useful when making high level business decisions, to truly bring business to the next level the data must be utilized for optimizing activity down to each customer level. One of the most popular examples is personalizing the customer experience.

Personalization can be done on different customer touch points. For example, when the customer is visiting your website we know from the data what they have bought before, or what pages they have visited. Combining single customer data with other customers, you can recommend relevant products or content tailored specifically to the customer who is browsing your website. The same approach can be extended to email, advertisement campaigns, or even physical stores. This way customer experience can stay consistent across all touch points. For any business, this can serve as a key differentiator.

A good case study showing how taking advantage of owned data can drive business is Zara. Using data as its backbone they manage each of their 2000 stores inventory and what's on display on a daily basis. This would be impossible to do if they did not have full access to the collected data set.

What Is Clickstream Data?

To understand how we can use clickstream dataset, first, we need to define what kind of data it contains and how clickstream data is collected. We can define clickstream as a sequence of events that represent visitor actions on the website.

The most common and useful event is called a ‘click,’ which indicates what a visitor has viewed. Of course, we are not limited to collecting just clicks; we can also look at impressions, purchases, and any other events relevant to the business.

Furthermore, an event can include multiple contexts that enrich it, like how long the page load took or what type of browser/device the visitor is using. Essentially, good clickstream data clearly defines a full set of events which allows you to get a complete picture of customer behavior. Conceptually, we can look at events as having their own grammar.

Traditionally. such events are collected using a JavaScript tracker which is loaded with the page on every request. The tracker sends a JSON POST request to a collector website which stores, validates, and enriches it with additional data, and finally sends it to the data warehouse for further analysis. It can be visualized as below:

Image credit: https://github.com/snowplow/snowplow

Later in the article, we’ll take a look at different options for tracking events.

Example Data Output

The best way to gain a deeper understanding of clickstream data is to have a look at particular examples. Below we provide a sample event for page view:

  • APP: joes_bikes

  • EVENT: Pageview

  • TIME: Thu, 25 Apr 2019 08:33:03 GMT

  • COLLECTOR: collector.stacktome.com

  • METHOD: POST

Beacon

Event Type

stringPageview

Application ID

stringjoes_bikes

Event ID

stringe8468c4a-5d95-42aa-81e1-c72d27a5018a

Device Created Timestamp

string2019-04-25T08:33:03.200Z

Device Sent Timestamp

string2019-04-25T08:33:03.204Z

Platform

stringweb

Tracker Name

stringcf2

Tracker Version

stringjs-2.8.2

Context

iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0

0

iglu:com.stacktome/page/jsonschema/1-0-2

id

number242894

language

stringen

country

stringuk

productSection

stringlamps

deliveryTiming

stringnext-day

searchTag

Stringfront lamps

type

stringHomePage

canonicalUrl

stringhttps://www.joesbikes.com/lamps

domainName

stringhttps://www.joesbikes.com


Browser

Browser User Agent

stringMozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/73.0.3683.86 Chrome/73.0.3683.86 Safari/537.36

Browser Language

stringen-US



In the table above you can see a sample of data sent from a fictional online store, joesbikes.com, which is based on a real tracking event. Most essential fields are the event timestamp which allows analyzing events as time series.

Another important part is a custom page context which describes viewed paged details. A notable field is the search tag, as it provides what the user is searching for and if that matches the page they have viewed. Combining such events into a sequence allows us to see if the path the user takes for a purchase is optimal or if there are ways to improve it and, at the same time, improve conversion rates.

Lastly, we can see that we also get browser information. This can be useful to understand what type of devices your visitors are using and especially if there are problems with rendering certain pages. For instance, we can analyze if our mobile visitors convert at the same rate as desktop users. Given how important the mobile experience is today, its critical for a business to have this visibility.

Now let's have a look at different event sample of a product impression.

Context:

iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0

iglu:com.stacktome/product_impression/jsonschema/1-0-2

id

number104463

name

stringJoes Leather Gloves

type

stringGloves

regularPrice

number29.99

currentPrice

number19.99

currency

stringusd

nReviews

number11732

avReviews

number4.3

row

number2

column

number1

containerName

stringbestsellers

Here, we can see the main attributes of a product shown on the page. The captured event of an impression should help us determine what product was displayed, at which location on the page, and what variable attributes it used. From the above event, we can see that gloves were displayed at the second row and the first column in a container on a page called 'bestsellers.' We can also see the price and review score used for the product. This information alone is enough to determine which products displayed perform as well based on their exposure across the website. Also, we can determine how well they “compete” with each other given the same or different variables (price, location, etc.).

As you can see from the examples above, the information that's being tracked is fairly trivial from a single event perspective. The power comes from having access to these events across all the pages that visitors are interacting with, over a period of time. Then you can measure which pages might need improvement or if the overall website can perform much better. We’ll take a look at a few use cases in the next article.

Data science Event

Published at DZone with permission of Evaldas Miliauskas. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Event Driven Architecture (EDA) - Optimizer or Complicator
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • Integrating Google BigQuery With Amazon SageMaker

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!