DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. BI Testing: Identifying Quality Issues During the DWH Design Phase

BI Testing: Identifying Quality Issues During the DWH Design Phase

This post outlines strategies for catching data integrity issues as early as possible to reduce the resources required to address them and ensure that data.

Wayne Yaddow user avatar by
Wayne Yaddow
·
Jan. 17, 19 · Tutorial
Like (2)
Save
Tweet
Share
4.34K Views

Join the DZone community and get the full member experience.

Join For Free

Decisions in today's organizations have become increasingly data-driven and real-time, so the systems that support business decisions must be of exceptional quality. People sometimes confuse testing data warehouses that produce business intelligence (BI) reports with backend or database testing or with testing the BI reports themselves. Data warehouse testing is much more complex and diverse. Nearly everything in BI applications involves the data that "drives" intelligent decision making.

Data integrity can be compromised at all DWH/BI phases: when data is created, integrated, moved, or transformed. However, testing of data warehouses is usually deferred until late in the cycle. If testing is shortchanged (e.g., due to schedule overruns or limited resource availability), there's a high risk that critical data integrity issues may slip through the verification efforts. Even if thorough testing is performed, it's difficult and costly to address any data integrity issues exposed by this late-cycle testing. At this phase, the cause of the error can be anything from a data quality issue stemming from when the data enters the data warehouse, to a data processing issue caused by a malfunction of the business logic along the layers of the data warehouse and its BI components. This is a painstakingly tedious task and often consumes considerable resources.

My next few blogs outline strategies and best practices for catching data integrity issues as early as possible to reduce the resources required to address them and ensure that data

Identifying Quality Issues During the DWH Design Phase

The first level of testing and validation begins with the formal acceptance of the logical data model and "low-level ETL design" (LLD). All further testing and validation will be based on the understanding of each of the data elements in the model.

Data elements that are created through a transformation or summary process must be clearly identified and calculations for each of these data elements must be clearly documented and easily interpreted.

During LLD reviews and updates, special consideration should be given to typical modeling scenarios that exist in the project. For example:

  • Verify that many-to-many attribute relationships are clarified and resolved.
  • Verify the types of keys that are used: surrogate keys and/or natural keys.
  • Verify that business analysts/DBA's review with ETL architect and developers (application), the lineage and business rules for extracting, transforming, and loading the data warehouse.
  • Verify that all transformation rules, summarization rules, and matching and consolidation rules have clear specifications.
  • Confirm that specified transformations, business rules and cleansing described in low-level design (LLD) and application logic specifications have been coded correctly in ETL, Java, and SQL used for data loads.
  • Verify that ETL procedures are documented to monitor and control data extraction, transformation, and loading. The procedures should describe how to handle exceptions and program failures.
  • Verify that data consolidation of duplicate or merged data is properly handled.
  • Verify that samplings of domain transformations will be utilized to confirm they are properly changed.
  • Ensure unique values exist for primary and foreign key fields between the source data and the data loaded to the warehouse.
  • Validate that target data types are as specified in the design and/or the data model.
  • Verify that data field types and formats are specified and implemented.
  • Verify that default values are specified for fields where needed.
  • Verify that processing for invalid field values in the source are defined.
  • Verify that expected ranges of field values are specified.
  • Verify that all keys generated by the ETL "sequence generator" are identified.
  • Verify that slowly-changing dimensions are described.
[Read Wayne's paper on BI and Data Warehouse Testing: Identifying Data Integrity Issues at Every DWH Phase]
Data integrity Design Data warehouse Relational database

Published at DZone with permission of Wayne Yaddow. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The Real Democratization of AI, and Why It Has to Be Closely Monitored
  • A Simple Union Between .NET Core and Python
  • Kotlin Is More Fun Than Java And This Is a Big Deal
  • Implementing Infinite Scroll in jOOQ

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: