DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • A Data Warehouse Alone Won’t Cut It for Modern Analytics
  • How to Generate Customer Success Analytics in Snowflake
  • Building AMQP-Based Messaging Framework on MongoDB
  • Why Database Migrations Take Months and How to Speed Them Up

Trending

  • The Role of Retrieval Augmented Generation (RAG) in Development of AI-Infused Enterprise Applications
  • Recurrent Workflows With Cloud Native Dapr Jobs
  • STRIDE: A Guide to Threat Modeling and Secure Implementation
  • Virtual Threads: A Game-Changer for Concurrency
  1. DZone
  2. Data Engineering
  3. Data
  4. 8 Tips for Configuring Adobe Analytics Clickstream Data Feeds

8 Tips for Configuring Adobe Analytics Clickstream Data Feeds

With some planning and awareness, you'll quickly be on your way to harnessing your clickstream data to discover hidden trends, behaviors, and preferences!

By 
Thomas Spicer user avatar
Thomas Spicer
·
Nov. 02, 17 · Opinion
Likes (2)
Comment
Save
Tweet
Share
8.9K Views

Join the DZone community and get the full member experience.

Join For Free

If you are interested in cultivating a deeper understand of behaviors, trends, and patterns resident in your web and mobile data, then you want access to the underlying data. If you are an Adobe Analytics customer, access to the underlying data means clickstream data. Assuming that you made the decision and you want to take advantage of clickstream data, there are a few things that will ensure the process goes smoothly, especially if you have a more complex setup.

8 Tips for Configuring Adobe Analytics Clickstream Data Feeds

Here are eight tips from the Openbridge team on how to do this.

1. Pick Your Report Suite(s)

Sounds simple, but make sure that you are clear on which RSIDs you want clickstream data from. Build a list of all the RSIDs you want Adobe to package. Some companies have multiple RSIDs, which can mean requesting delivery for each. If you have a global rollup configured for all RSIDs, you might be able to leverage that. The number of RSIDs will impact the scale and volume of your data — the more RSIDs, the greater complexity for testing (see points 6 and 8).

2. Determine How Much History

Define the history you want to be retroactively delivered. Adobe normally will set up “go-forward” feeds. This means that they do not include historical data by default. If you want history, do you want the prior 3, 9, or 13 months? Check with Adobe on current limits. Please note that this can take awhile to set up and complete depending on the scale of the request.

3. Delivery Period

Define the period of each delivery. We recommend a single daily export (vs. multiple per day). For a year, you would have 365 timestamped exports for an RSID. Note: This would be multiplied times the number of years and RSIDs. If we get two years of data, for 10 RSIDS, this will result in about 7,000+ daily exports. Factor this into your testing!

4. Delivery Method

Define the delivery method/location, including security creds. We suggest Amazon S3 as it will simplify and minimize resource needed relating to storage. SFTP is a viable alternative.

5. Packaging and Organizing Your Delivery

Define how the delivery should be organized. I suggest that everything be grouped by RSID. For example, make sure Adobe delivers each RSID to its own location (/<rsid>/20160801_hitdata.tar.gz).

6. Testing

Run through a set of test deliveries for one RSID. Validate creds, workflow, and cross-checks. You will want to make sure you have defined a core set of tests to validate that the data meets expectations. Identify issues early as possible, including oddities in the data. Make sure you are inspecting the underlying data during these tests against expected outcomes. Clickstream data will often reveal flaws, gaps, and bugs in tagging that had been previously hidden.

7. Prioritize

If you have multiple RSIDs, cadence the deliveries for RSIDs by priority. For example, start with the RSID for brand “X” first. You can validate everything is as expected and then trigger the remaining RSIDs for brand “A,” “B,” and “C.” Adobe may have some suggestions on how best to cadence the deliveries as well.

8. Ongoing Testing and Audits

Establish audits of delivered files. If there is a delivery failure or the delivery of the same data more than once, you will want to be able to be able to have an audit trail to provide the Adobe support team. Adobe may have suggestions, but it is usually helpful to track an audit and manifestation of any gaps/issues once every couple of weeks. If needed, this would be sent to Adobe Customer Care, who can then determine a course of action to remediate any issues. The longer you wait to highlight an issue, the longer it will take Customer Care to respond to and fix it. Delays mean they may have trouble finding the batch job for a given day. Adobe may keep a job history around for thirty days so waiting any longer can slow the effort down.

Additional Areas to Consider

It is also important to understand that you will need to care for a few other areas which include post processing and additional pieces of data that give your clickstream data context.

Special Characters

Adobe describes different sets of rules to create parity with the reporting UI. For example, here is how to calculate the visitor metric:

  • Exclude all rows where exclude_hit > 0.

  • Exclude all rows with hit_source = 5,7,8,9. Keep in mind that 5, 8, and 9 are summary rows uploaded using data sources. 7 represents transaction ID data source uploads that should not be included in visit and visitor counts. 

  • Combine post_visid_high with post_visid_low and count the number of unique combinations.

Since you are using raw data, these types of rules are needed to filter events and provide context. Adobe also describes how to handle various special characters native to clickstream data. For example:

  • Starting at the beginning of the file, read until you locate a tab, newline, backslash, or caret character. Perform an action based on the special character encountered:

    • Tab: Insert the string up that point into a data store cell and continue.

    • Newline: Complete the data store row.

    • Backslash: Read the next character, insert the appropriate string literal, and continue.

    • Caret: Read the next character, insert the appropriate string literal, and continue.

These reflect some special characters, not all. You may find that new characters are introduced according to how your tagging was implemented. If you have multiple RSID’s, each with different implementation teams and models, it is not uncommon that they introduce their own special characters, sometimes in direct conflict with reserved (delimiter) characters specified by Adobe.

Lookup Tables

There are a few files in a clickstream feed. The event data is present in hit_data.tsv, but there is a collection of lookup and helper files provided by Adobe. Here are a few that the ZIP file would contain:

  • column_headers.tsv
  • browser.tsv
  • browser_type.tsv
  • color_depth.tsv
  • connection_type.tsv
  • country.tsv
  • javascript_version.tsv
  • languages.tsv
  • operating_systems.tsv
  • plugins.tsv
  • resolution.tsv
  • referrer_type.tsv
  • search_engines.tsv
  • event_lookup.tsv
  • Our Friend, the eVar

    It is almost always the case that a customer will have their own lookup tables, especially for cases where the use of eVars was unconstrained or did not have a strategy attached. For example, an evar 1 in 2016 may have been repurposed in 2017 to represent something completely different. This means that the underlying context of the data will have changed! Make sure you have a sense of not only how eVars are used today but also in the past. Your tagging strategy and plan should provide these details. Plan accordingly.

    Good Luck!

    Working with clickstream data can be rewarding, especially if your team has the initiative, expertise, and tools to tap into the rich insights that rest within it. With some simple planning and awareness of what the effort entails you can quickly be on your way to harnessing your clickstream data to discover hidden trends, behaviors, and preferences of your customers and visitors!

    Data (computing) Analytics Delivery (commerce) Database

    Published at DZone with permission of Thomas Spicer. See the original article here.

    Opinions expressed by DZone contributors are their own.

    Related

    • A Data Warehouse Alone Won’t Cut It for Modern Analytics
    • How to Generate Customer Success Analytics in Snowflake
    • Building AMQP-Based Messaging Framework on MongoDB
    • Why Database Migrations Take Months and How to Speed Them Up

    Partner Resources

    ×

    Comments
    Oops! Something Went Wrong

    The likes didn't load as expected. Please refresh the page and try again.

    ABOUT US

    • About DZone
    • Support and feedback
    • Community research
    • Sitemap

    ADVERTISE

    • Advertise with DZone

    CONTRIBUTE ON DZONE

    • Article Submission Guidelines
    • Become a Contributor
    • Core Program
    • Visit the Writers' Zone

    LEGAL

    • Terms of Service
    • Privacy Policy

    CONTACT US

    • 3343 Perimeter Hill Drive
    • Suite 100
    • Nashville, TN 37211
    • support@dzone.com

    Let's be friends:

    Likes
    There are no likes...yet! 👀
    Be the first to like this post!
    It looks like you're not logged in.
    Sign in to see who liked this post!