Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

8 Tips for Configuring Adobe Analytics Clickstream Data Feeds

DZone's Guide to

8 Tips for Configuring Adobe Analytics Clickstream Data Feeds

With some planning and awareness, you'll quickly be on your way to harnessing your clickstream data to discover hidden trends, behaviors, and preferences!

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

If you are interested in cultivating a deeper understand of behaviors, trends, and patterns resident in your web and mobile data, then you want access to the underlying data. If you are an Adobe Analytics customer, access to the underlying data means clickstream data. Assuming that you made the decision and you want to take advantage of clickstream data, there are a few things that will ensure the process goes smoothly, especially if you have a more complex setup.

8 Tips for Configuring Adobe Analytics Clickstream Data Feeds

Here are eight tips from the Openbridge team on how to do this.

1. Pick Your Report Suite(s)

Sounds simple, but make sure that you are clear on which RSIDs you want clickstream data from. Build a list of all the RSIDs you want Adobe to package. Some companies have multiple RSIDs, which can mean requesting delivery for each. If you have a global rollup configured for all RSIDs, you might be able to leverage that. The number of RSIDs will impact the scale and volume of your data — the more RSIDs, the greater complexity for testing (see points 6 and 8).

2. Determine How Much History

Define the history you want to be retroactively delivered. Adobe normally will set up “go-forward” feeds. This means that they do not include historical data by default. If you want history, do you want the prior 3, 9, or 13 months? Check with Adobe on current limits. Please note that this can take awhile to set up and complete depending on the scale of the request.

3. Delivery Period

Define the period of each delivery. We recommend a single daily export (vs. multiple per day). For a year, you would have 365 timestamped exports for an RSID. Note: This would be multiplied times the number of years and RSIDs. If we get two years of data, for 10 RSIDS, this will result in about 7,000+ daily exports. Factor this into your testing!

4. Delivery Method

Define the delivery method/location, including security creds. We suggest Amazon S3 as it will simplify and minimize resource needed relating to storage. SFTP is a viable alternative.

5. Packaging and Organizing Your Delivery

Define how the delivery should be organized. I suggest that everything be grouped by RSID. For example, make sure Adobe delivers each RSID to its own location (/<rsid>/20160801_hitdata.tar.gz).

6. Testing

Run through a set of test deliveries for one RSID. Validate creds, workflow, and cross-checks. You will want to make sure you have defined a core set of tests to validate that the data meets expectations. Identify issues early as possible, including oddities in the data. Make sure you are inspecting the underlying data during these tests against expected outcomes. Clickstream data will often reveal flaws, gaps, and bugs in tagging that had been previously hidden.

7. Prioritize

If you have multiple RSIDs, cadence the deliveries for RSIDs by priority. For example, start with the RSID for brand “X” first. You can validate everything is as expected and then trigger the remaining RSIDs for brand “A,” “B,” and “C.” Adobe may have some suggestions on how best to cadence the deliveries as well.

8. Ongoing Testing and Audits

Establish audits of delivered files. If there is a delivery failure or the delivery of the same data more than once, you will want to be able to be able to have an audit trail to provide the Adobe support team. Adobe may have suggestions, but it is usually helpful to track an audit and manifestation of any gaps/issues once every couple of weeks. If needed, this would be sent to Adobe Customer Care, who can then determine a course of action to remediate any issues. The longer you wait to highlight an issue, the longer it will take Customer Care to respond to and fix it. Delays mean they may have trouble finding the batch job for a given day. Adobe may keep a job history around for thirty days so waiting any longer can slow the effort down.

Additional Areas to Consider

It is also important to understand that you will need to care for a few other areas which include post processing and additional pieces of data that give your clickstream data context.

Special Characters

Adobe describes different sets of rules to create parity with the reporting UI. For example, here is how to calculate the visitor metric:

  • Exclude all rows where exclude_hit > 0.

  • Exclude all rows with hit_source = 5,7,8,9. Keep in mind that 5, 8, and 9 are summary rows uploaded using data sources. 7 represents transaction ID data source uploads that should not be included in visit and visitor counts. 

  • Combine post_visid_high with post_visid_low and count the number of unique combinations.

Since you are using raw data, these types of rules are needed to filter events and provide context. Adobe also describes how to handle various special characters native to clickstream data. For example:

  • Starting at the beginning of the file, read until you locate a tab, newline, backslash, or caret character. Perform an action based on the special character encountered:

    • Tab: Insert the string up that point into a data store cell and continue.

    • Newline: Complete the data store row.

    • Backslash: Read the next character, insert the appropriate string literal, and continue.

    • Caret: Read the next character, insert the appropriate string literal, and continue.

These reflect some special characters, not all. You may find that new characters are introduced according to how your tagging was implemented. If you have multiple RSID’s, each with different implementation teams and models, it is not uncommon that they introduce their own special characters, sometimes in direct conflict with reserved (delimiter) characters specified by Adobe.

Lookup Tables

There are a few files in a clickstream feed. The event data is present in hit_data.tsv, but there is a collection of lookup and helper files provided by Adobe. Here are a few that the ZIP file would contain:

  • column_headers.tsv
  • browser.tsv
  • browser_type.tsv
  • color_depth.tsv
  • connection_type.tsv
  • country.tsv
  • javascript_version.tsv
  • languages.tsv
  • operating_systems.tsv
  • plugins.tsv
  • resolution.tsv
  • referrer_type.tsv
  • search_engines.tsv
  • event_lookup.tsv
  • Our Friend, the eVar

    It is almost always the case that a customer will have their own lookup tables, especially for cases where the use of eVars was unconstrained or did not have a strategy attached. For example, an evar 1 in 2016 may have been repurposed in 2017 to represent something completely different. This means that the underlying context of the data will have changed! Make sure you have a sense of not only how eVars are used today but also in the past. Your tagging strategy and plan should provide these details. Plan accordingly.

    Good Luck!

    Working with clickstream data can be rewarding, especially if your team has the initiative, expertise, and tools to tap into the rich insights that rest within it. With some simple planning and awareness of what the effort entails you can quickly be on your way to harnessing your clickstream data to discover hidden trends, behaviors, and preferences of your customers and visitors!

    Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

    Topics:
    big data ,clickstream data ,adobe analytics ,data analytics ,data visualization

    Published at DZone with permission of Thomas Spicer. See the original article here.

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}