Maintaining a Data Warehouse
Maintaining a Data Warehouse
A discussion of the nature of data warehouses and the difficulties inherent in maintaining them, as well as look at a data warehouse tool.
Join the DZone community and get the full member experience.Join For Free
Cloudera Data Flow, the answer to all your real-time streaming data problems. Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily. Learn more today.
In more traditional IT projects, when a successful system is tested, deployed and in daily operation, its developers can usually sit back and take a well-deserved rest as users come on-board, and leave ongoing maintenance to a small team of bug-fixers and providers of minor enhancements. At least until the start of the next major release cycle. Developers of today’s data warehouses have no such luxury.
The measure of success of a data warehouse is only partly defined by the number and satisfaction level of active users. The nature of creative decision-making support is that users are continuously discovering new business requirements, changing their mind about what data they need, and thus demanding new data elements and structures on a weekly or monthly basis. Indeed, in some cases, the demands may arrive daily!
This need for agility in regularly delivering new and updated data to the business through the data warehouse has long been recognized by vendors and practitioners in the space. Unfortunately, such agility has proven difficult to achieve in the past. Now, ongoing digitalization of business is driving ever higher demands for new and fresh data. Current—and, in my view, short-sighted—market thinking is that a data lake filled with every conceivable sort of raw, loosely managed data will address these needs. That approach may work for non-critical, externally sourced social media and Internet of Things data. However, it really doesn’t help with the legally-binding, historical, and (increasingly) real-time internally and externally sourced data currently delivered via the data warehouse.
Fortunately, the agile and automated characteristics of the Data Vault/data warehouse automation (DWA) approach described in the design, build, and operate phases discussed in earlier posts apply also to the maintenance phase. In fact, it may be argued that these characteristics are even more important in the maintenance phase than in the earlier ones of data warehouse development.
One explicit design point of the Data Vault data model is agility. A key differentiator between Hub, Link, and Satellite tables is that they have very different usage types and temporal characteristics. Such separation of concerns allows changes in both data requirements (frequent and driven by business needs) and data sources (less frequent, but often requiring deep “data archeology”) to be handled separately and more easily than in traditional designs. In effect, the data warehouse is structured according to good engineering principles, while the data marts flow with user needs. This structuring enables continuous iteration of agile updates to the warehouse, continuing through to the marts, by reducing or eliminating rework of existing tables when addressing new needs. For a high-level explanation of how this works, see Sanjay Pande’s excellent “Agile Data Warehousing Using the Data Vault Architecture”article.
The engineered components and methodology of the Data Vault approach are particularly well-suited to the application of DWA tools, as we saw in the design and build phases. However, it is in the maintain phase that the advantages of DWA become even more apparent. Widespread automation is essential for agility in the maintenance phase, because it increases developer productivity, reduces cycle times, and eliminates many types of coding errors. WhereScape Data Vault Express incorporates key elements of the Data Vault approach within the structures, templates, and methodology it provides to improve a team’s capabilities to make the most of potential automation gains.
Furthermore, WhereScape’s metadata-driven approach means that all the design and development work done in preceding iterations of data warehouse/mart development is always immediately available to the developers of a subsequent iteration. This is provided through the extensive metadata that WhereScape stores in the relational database repository and makes available directly to developers of new tables and/or population procedures. This metadata plays an active role in the development and runtime processes of the data warehouse (and marts) and is thus guaranteed to be far more consistent and up-to-date than typical separate and manually maintained metadata stores such as spreadsheets or text documents.
In addition, WhereScape automatically generates documentation, which is automatically maintained, and related diagrams, including impact analysis, track back/forward, and so on. These artifacts aid in understanding and reducing the risk of future changes to the warehouse, by allowing developers to discover and avoid possible downstream impacts of any changes being considered.
Another key factor in ensuring agility and success in the maintenance phase is the ongoing and committed involvement of business people. WhereScape’s automated, templated approach to the entire design, build, and deployment process allows business users to be involved continuously and intimately during every stage of development and maintenance of the warehouse and marts.
With maintenance, we come to the end of our journey through the land of automating warehouses, marts, lakes, and vaults of data. At each step of the way, combining the use of the Data Vault approach with data warehouse automation tools simplifies technical procedures and eases the business path to data-driven decision making. WhereScape Data Vault Express represents a further major stride toward the goal of fully agile data delivery and use throughout the business.
Opinions expressed by DZone contributors are their own.