Advantages of a Graph-Based Metadata Repository
Thinking of implementing a graph-based repository on your next project? you're not alone. Read here to find out what you need to know to make it work for you.
Join the DZone community and get the full member experience.Join For Free
Many higher education institutions, like the University of Washington (UW), are implementing large, complex, Software-as-a-Service solutions for HR, payroll, and other administrative systems. While these enterprise systems provide useful new tools and resources, organizations face a significant challenge in helping users understand what’s changing and how to prepare for that change.
The UW is in the process of replacing its 30+ year-old HR and payroll system with a Software-as-a-Service human capital management tool. This multi-year effort is the largest administrative transformation in UW’s history, touching every person and department at the University.
Knowledge Navigator (KN) was created as a metadata repository and to facilitate system migration. KN represents concepts and data relationships both visually and textually, highlighting linkages between the old and new systems in a web-based, interactive platform as seen below.
KN keeps users informed and engaged throughout the enterprise system migration by providing self-service access to conceptual and technical descriptions, definitions, lineage, interactive relationship maps, and impact analysis information.
Picking the Right Solution
When effectively governed, a metadata repository establishes a common understanding and expectations across the University. It provides a view into the flow of data, the ability to perform impact analysis, a common business vocabulary and accountability for its terms and definitions.
The comprehensive management of metadata is vital to enabling an organization to oversee changes while delivering trusted, secure data in a complex data integration environment. Solid metadata management tools play a central role in holistic system management, including system migration.
Multiple metadata solutions have been evaluated, from custom SharePoint repositories to custom relational database (RDBMS) solutions to commercial vendor tools. None worked well, nor offered the amount of customization required, and they did not provide the ability to “stitch” together various data sources into a complete before-and-after picture.
In 2014, at a national information management conference, the University of Notre Dame presented on graph databases. Through a subsequent collaboration with them, the UW team was inspired to move to Neo4j as the database of choice.
Keep It Simple
Many metadata and data governance efforts fail because they attempted to accomplish too much at once. We focused on a specific problem we wanted to solve – how to most easily demonstrate change from the “old-to-new” perspective – that could return tangible value, but was not boiling the ocean from the standpoint of data-volume, organizational or architectural complexity. The data model for Knowledge Navigator is relatively simple, as seen in the partial data model below for databases and business intelligence (BI) reports:
Knowledge Navigator stands out among metadata repositories because of its unique ability to provide communication about changes to the University’s data end-users. Built into KN is help for understanding the impact of migrating from a mainframe system to a cloud-based system, and how to prepare for the changes.
Data lineage maps illustrate the parallels between the old and new systems. Interactive diagrams invite users to select objects and artifacts to understand the relationships between them. KN explains the new concepts and definitions users should know to work effectively within the new system. Users can view high-level conceptual models as well as technical metadata about tables and columns.
For example, users can see that the legacy table called
is related to
in the new system, or that 14 columns from legacy
relate to columns in the new
and that there is a brand-new, yet related table, called
Data Lineage and Impact Analysis
KN identifies affected tables and columns in the reporting operational data store, as well as affected business intelligence reports. In the screenshot below, all source tables are displayed for the Academic Personnel Appointment Report.
The report dependency data can be exported to a CSV, and then compared with the report traffic information to do report impact analysis:
Looking ahead, on the heels of the HR/Payroll Modernization (HRPM) program, the University will replace its equally outmoded finance system. The Finance Business Transformation (FBT) program promises to be even larger and more complex than HRPM, and we are already working with FBT leadership to plan the data that will go into KN to facilitate impact analysis and user guidance throughout the project.
In addition, KN is used by internal project teams to document metadata used in the development process and not accessible to the general public, such as internal glossaries and databases, internal notes and links, and source-to-target mappings of databases, APIs, and other data sources.
Likewise, new data buildout projects benefit from the exposed source-to-target mapping and data transformations. Developers and testers save time and effort through ready access to these critical underlying details.
Using all the impact relationships, we can easily query Neo4j to show us which tables have the most dependencies, like you can see below:
Knowledge Navigator has become an essential tool for data users and metadata repository managers to understand the meaning, usage and impact of data and business concepts at the University of Washington. Using Neo4j as our database, we are well-positioned to expand its capabilities to include the metadata necessary to build it into a powerful enterprise tool.
Published at DZone with permission of Pieter Visser, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.