An Overview of Data Engineering for Product Experimentation
This article focuses on data engineering for supporting product experimentation which is rapidly developing to be a necessary core competency.
Join the DZone community and get the full member experience.Join For Free
Data engineering is a broad field and is often used as a catch-all term to signify a variety of different works. Anything that involves ingestion, storage, processing, or serving of data can constitute data engineering, and the nature of work also varies meaningfully based on the domain of the data. In this article, we focus specifically on data engineering for supporting product experimentation which is rapidly developing to be a necessary core competency for all organizations that aim to be data-driven.
Simply put, experimentation data engineering is the process of designing, building, and maintaining systems and infrastructure for collecting, storing, and analyzing data from experiments.
The image above details the high-level components that are part of any mature Experimentation Platform. Each of these components generates data that needs to be ingested and managed effectively by the experimentation data engineering function.
What's Unique About Experimentation Data Engineering?
Experimentation Spans Multiple Domains
Data engineering teams can do their best work when they understand the domain of their stakeholders and anticipate their needs effectively.
In companies with a strong experimentation culture, experimentation is leveraged for all aspects of the business:
- Non-member / not-yet-customer conversion or acquisition experiments.
- Customer engagement and retention experiments.
- Algorithm experiments
- Outbound marketing experiments
- New partner or payment integration experiments.
- New business model experiments.
Each of these types of experiments has its own unique challenges since they are focused on very different domains with very different stakeholder sets. Further, the complexity and velocity of experimentation could vary significantly, requiring different operational support models. The excellent publication "Online Controlled Experiments and A/B Tests" gives an excellent overview of online experimentation for readers that are interested in diving deeper.
Experimentation Data Has a Variety of Functional Stakeholders
Further, experimentation data needs to support many different types of analyses aimed at different functions in the organization:
- Reporting/Business Intelligence type Analyses: The ultimate goal of experiments is to understand the impact of some product or infrastructure change on some business KPI. This analysis is eventually consumed by business stakeholders like Product Managers and other executives.
- Operational/Diagnostic Analyses: Experiments, by definition, are new features driven by new code changes against a production "stable" experience. This means that experimental data can often be associated with bugs or other issues, which require an increased need for operational and diagnostic analyses to ensure the fidelity of the experiment. Further, the lifecycle of each experiment also needs to be maintained with appropriate metadata. These analyses are intended to be done by data scientists and engineers.
- Scientific Analyses: Experiments are a method to perform causal inference on the effect of a change on a metric of interest. Causal inference is a scientific field of study that is increasingly becoming a high priority for organizations, much like Machine Learning is. For most basic experiments, while simple statistical hypothesis testing may be sufficient, increasingly, we are seeing the advent of complex techniques like CUPED and other model-driven causal effect estimation methods that need to be applied to experimental data. This requires a significantly higher level of data quality guarantees and further novel data system architectures to enable the computation of these novel statistics. Further, since this is an area of active research, experimentation data needs to be flexible enough to allow for a lot of ad-hoc analyses. The key stakeholders for this are actual scientists and statisticians.
Experimentation Data Requires a Platform-Thinking Mindset
Given the variety of different stakeholders and use cases that experimentation data needs to support, to truly scale and enable organizations to become data-driven, experimentation data engineering teams need to think of themselves as creating a platform product, i.e., focus on the building blocks and capabilities that are core to any experimentation setting and enable the customers of the platform to mix and match and extend as necessary.
Recommendations for Creating a Strong Experimentation Data Engineering Team
- Focus on Self-Service and Enablement: Without this approach, experimentation data engineering teams will likely start drowning in support requests
- Invest in foundational data quality tooling and processes: Errors or inconsistencies in the data can have significant impacts on the validity and reliability of the experiment results, and problems compound if not fixed early.
- Build strong relationships on all sides: Software engineering teams that produce the data, data science teams that consume the data, and ultimately product and business teams that make decisions on the recommendations based on the data. Treat every one of these partners as equal stakeholders and build proactive relationships. Data engineering teams often treat only the Data Scientists as their stakeholders, which may not always be sufficient.
- Always think in terms of building blocks, reusability, and APIs.
The field of data engineering and the practice of experimentation as a technical capability are both rapidly evolving. It is clear that experimentation is a crucial aspect of data management for all organizations, along with business intelligence and reporting and machine learning. To this extent, we are also seeing a rapid increase in the number of companies being developed around providing easier experimentation capabilities as a service, and the concept of an experimentation platform is emerging as a core infrastructural component for technology companies.
Opinions expressed by DZone contributors are their own.