Rapid Data Prototyping Using a Relational Database
Rapid Data Prototyping Using a Relational Database
Join the DZone community and get the full member experience.Join For Free
Java-based (JDBC) data connectivity to SaaS, NoSQL, and Big Data. Download Now.
This is a guest post by Michael Smith: Michael Smith is a TopDown Consulting solutions architect. Michael has over 20 years experience in the BI/DW and Performance Management space. He brings deep, cross-industry experience in the areas of Business Intelligence, Data Warehousing, Enterprise Performance Management, Technical Architecture and custom enterprise solutions development. Most recently, Michael has focused on implementing enterprise-wide Master Data Management and data integration systems, enabling complete Enterprise Performance Management solutions. To review other blogs by Michael Smith, click here.
Most companies implementing an EPM, consolidation or reporting system will need to bring together metadata and data from several different sources. These sources will likely have been developed at different times using different technologies with different intentions. Each of these factors will have contributed to there being differences in the hierarchies available in each system, the content of each hierarchy, the format and type of the data available and possibly more. At some point, in some tool, these differences will have to be resolved and all of these items brought together in a common defined set of hierarchies and a standard data type. An ETL tool can be leveraged to provide for the transformation, but many organizations don’t own one. A simple set of views and stored procedures in a relational database can make this job much easier. One of the main advantages to this approach is the speed in prototyping.
I am referring to this process as “Rapid Data Prototyping” because it initially involves a few manual steps and direct database access using development tools, such as loading the data directly into tables, in order to speed up enabling the project’s other development efforts. By providing real data and metadata quickly, the whole development team can begin their tasks, such as further refining the hierarchies, developing sample cubes, and building prototype reports with real data/metadata to further their design and development discussions as quickly as possible. It also helps to identify issues with data/metadata earlier in the process which will be a huge win for the project.
The first step in the Rapid Data Prototyping process is to gather real data, or a representative subset, from several of the expected source systems and load this to the database. Simple tables are created into which data is loaded from each source. Views are then placed on each of these initial source tables which transform the data, add or split metadata, provide prefixes or produce other transformations to create a common metadata/data set. In some cases, a simple mapping table may be needed to add or transform a dimension, or provide additional details such as aliases or target specific properties. Defaults can simply be provided within the view. Stored procedures may be needed for more complex transformations. In some cases, using a stored procedure to pull the initial data from the loading table, completing a complex transformation and loading this result to another table will be the best approach for a specific source. A view can then be built on top of this results table to provide any final tweaks such as filtering. As each individual source data view is completed, it can be combined into one common data view. This view, along with the common set of metadata views, can then be leveraged by EIS or Essbase studio to create test cubes, or the data can be extracted to a file for loading into other systems such as HFM.
The views and stored procedures above can continue to change and be refined as the development effort identifies new requirements or changes in old ones. As this core of data/metadata views/processes evolves, it continues to act as the one data source for the new system and the one location where changes need to be made for all of the target systems in the new development effort.
Once the initial development work has been enabled by leveraging this approach, additional work will be needed to turn this rapid data prototype into a production system. For example, the initial data load alone may include creating interface tables to receive the initial load of source data/metadata from each system as well as developing database loading processes for each, meeting the corporate standards and likely integrating with some batch control process. Then, stored procedures can be developed for moving data into archiving tables, providing many of the transformations previously done in the prototyping views, such as adding prefixes, trimming leading zeros, and adding or mapping metadata as needed for new or existing dimensions on the data records. Now comes one of the best parts of this system: the final views—which were leveraged for the EIS/Essbase Studio models or in the SQL—pulling the data from the database can simply be changed to pull from the new tables/views which are the final production-ready developed objects without impacting the models. Changes should now be minimized due to the earlier visibility into the data/metadata due to the rapid data prototyping.
In summary, relational databases provide a great opportunity to speed the development efforts of the whole project team by providing an early look at metadata and data when using this “Rapid Data Prototyping” method. The database can be leveraged to consolidate, transform and map data/metadata from several source systems in order to provide a fast and centralized single source for a development effort. Tools such as EIS/Essbase Studio can then be leveraged to provide a sample cube to further discussions, requirements gathering and build of not only the cube, but reports as well by utilizing real data as quickly as possible while still maintaining a centrally maintained common set.
Opinions expressed by DZone contributors are their own.