Locking in a Data Vault
Locking in a Data Vault
A data expert takes a look at the benefits of Data Vaults to development teams, and some different ways devs can use their data.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
So, I’m playing a little with words here. I’m certainly not advocating locking anybody or anything in a Data Vault. I want to share how you can lock in success as you design and deliver your new Data Vault. This post is aimed to specifically assist your development team.
Most of us are challenged by change. And developers are little different. They are typically very comfortable with a set of design approaches and tools learned in the past and it routinely frames their perspective on how to tackle the future. Combining the comfort of old ways with the tight timeframes and pressures of today’s business requests seldom leads to taking time to explore new options. As a result, it is easy for teams to be weighed down by outdated, limiting approaches to data infrastructure.
What we’ve learned with the evolution of the Data Vault methodology and data warehouse automation (DWA) over the past decade is that some areas within the data warehouse development process are broken. Dan Linstedt and the other contributors to the Data Vault model in the early 2000s recognized early on that the traditional data models were not able to meet the quality and agility goals of a data warehouse serving a modern data-focused business.
The Data Vault is constructed from some very carefully defined primitives, such as hubs, links, and satellite tables, that must be defined and populated in specific ways to work as intended. If developers use old approaches or, worse still, make up new ones themselves, disaster will follow.
In Data Vault 2.0, Linstedt has provided a methodology to drive best practice in the design of the data model and in the development of the function that populates it. Methodologies are great: I rely on a wonderful methodology for manually raising my computer screen to the ideal height as I write this post. But, within development teams, such behavior will lead to inconsistent approaches to development, result in delays in future maintenance as other developers struggle to understand different coding styles, and ultimately lead to a skills loss for your organization when your cleverest developer dies in a freak coding accident.
Updated Data Vaults address these issues by encoding the templates of the Data Vault components, and employing best practices in population processes and development methods within an automated, metadata-driven design and development environment. Starting in initial design collaboration between IT and business people, design choices are encoded in metadata to auto-generate the code and scripts responsible for defining Data Vault tables and populating them with the correct data, ensuring design consistency and completeness, and coding conformity to a single set of standards. Traceability is enforced and maintenance eased. Additionally, as your developers work, all is documented automatically— a task few enjoy or have the time to complete.
Locking in the Data Vault is all about maintaining consistency, ensuring complete documentation, and auto-generating best-practice models and code assets across design and development. Data warehouse automation is the logical foundation, and while change is hard, development teams will benefit greatly from an openness to doing it differently.
Opinions expressed by DZone contributors are their own.