BI Testing: Identifying Quality Issues During the DWH Design Phase
This post outlines strategies for catching data integrity issues as early as possible to reduce the resources required to address them and ensure that data.
Join the DZone community and get the full member experience.
Join For FreeDecisions in today's organizations have become increasingly data-driven and real-time, so the systems that support business decisions must be of exceptional quality. People sometimes confuse testing data warehouses that produce business intelligence (BI) reports with backend or database testing or with testing the BI reports themselves. Data warehouse testing is much more complex and diverse. Nearly everything in BI applications involves the data that "drives" intelligent decision making.
Data integrity can be compromised at all DWH/BI phases: when data is created, integrated, moved, or transformed. However, testing of data warehouses is usually deferred until late in the cycle. If testing is shortchanged (e.g., due to schedule overruns or limited resource availability), there's a high risk that critical data integrity issues may slip through the verification efforts. Even if thorough testing is performed, it's difficult and costly to address any data integrity issues exposed by this late-cycle testing. At this phase, the cause of the error can be anything from a data quality issue stemming from when the data enters the data warehouse, to a data processing issue caused by a malfunction of the business logic along the layers of the data warehouse and its BI components. This is a painstakingly tedious task and often consumes considerable resources.
My next few blogs outline strategies and best practices for catching data integrity issues as early as possible to reduce the resources required to address them and ensure that data
Identifying Quality Issues During the DWH Design Phase
The first level of testing and validation begins with the formal acceptance of the logical data model and "low-level ETL design" (LLD). All further testing and validation will be based on the understanding of each of the data elements in the model.
Data elements that are created through a transformation or summary process must be clearly identified and calculations for each of these data elements must be clearly documented and easily interpreted.
During LLD reviews and updates, special consideration should be given to typical modeling scenarios that exist in the project. For example:
- Verify that many-to-many attribute relationships are clarified and resolved.
- Verify the types of keys that are used: surrogate keys and/or natural keys.
- Verify that business analysts/DBA's review with ETL architect and developers (application), the lineage and business rules for extracting, transforming, and loading the data warehouse.
- Verify that all transformation rules, summarization rules, and matching and consolidation rules have clear specifications.
- Confirm that specified transformations, business rules and cleansing described in low-level design (LLD) and application logic specifications have been coded correctly in ETL, Java, and SQL used for data loads.
- Verify that ETL procedures are documented to monitor and control data extraction, transformation, and loading. The procedures should describe how to handle exceptions and program failures.
- Verify that data consolidation of duplicate or merged data is properly handled.
- Verify that samplings of domain transformations will be utilized to confirm they are properly changed.
- Ensure unique values exist for primary and foreign key fields between the source data and the data loaded to the warehouse.
- Validate that target data types are as specified in the design and/or the data model.
- Verify that data field types and formats are specified and implemented.
- Verify that default values are specified for fields where needed.
- Verify that processing for invalid field values in the source are defined.
- Verify that expected ranges of field values are specified.
- Verify that all keys generated by the ETL "sequence generator" are identified.
- Verify that slowly-changing dimensions are described.
Published at DZone with permission of Wayne Yaddow. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments