Over a million developers have joined DZone.

Advantages of a Graph-Based Metadata Repository

Thinking of implementing a graph-based repository on your next project? you're not alone. Read here to find out what you need to know to make it work for you.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Many higher education institutions, like the University of Washington (UW), are implementing large, complex, Software-as-a-Service solutions for HR, payroll, and other administrative systems. While these enterprise systems provide useful new tools and resources, organizations face a significant challenge in helping users understand what’s changing and how to prepare for that change.

The UW is in the process of replacing its 30+ year-old HR and payroll system with a Software-as-a-Service human capital management tool. This multi-year effort is the largest administrative transformation in UW’s history, touching every person and department at the University.

Knowledge Navigator (KN) was created as a metadata repository and to facilitate system migration. KN represents concepts and data relationships both visually and textually, highlighting linkages between the old and new systems in a web-based, interactive platform as seen below.

The University of Washington metadata repository used by the Knowledge Navigator app

KN keeps users informed and engaged throughout the enterprise system migration by providing self-service access to conceptual and technical descriptions, definitions, lineage, interactive relationship maps, and impact analysis information.

Picking the Right Solution

When effectively governed, a metadata repository establishes a common understanding and expectations across the University. It provides a view into the flow of data, the ability to perform impact analysis, a common business vocabulary and accountability for its terms and definitions.

The comprehensive management of metadata is vital to enabling an organization to oversee changes while delivering trusted, secure data in a complex data integration environment. Solid metadata management tools play a central role in holistic system management, including system migration.

Multiple metadata solutions have been evaluated, from custom SharePoint repositories to custom relational database (RDBMS) solutions to commercial vendor tools. None worked well, nor offered the amount of customization required, and they did not provide the ability to “stitch” together various data sources into a complete before-and-after picture.

In 2014, at a national information management conference, the University of Notre Dame presented on graph databases. Through a subsequent collaboration with them, the UW team was inspired to move to Neo4j as the database of choice.

Keep It Simple

Many metadata and data governance efforts fail because they attempted to accomplish too much at once. We focused on a specific problem we wanted to solve – how to most easily demonstrate change from the “old-to-new” perspective – that could return tangible value, but was not boiling the ocean from the standpoint of data-volume, organizational or architectural complexity. The data model for Knowledge Navigator is relatively simple, as seen in the partial data model below for databases and business intelligence (BI) reports:

The data model for the UW metadata repository


Knowledge Navigator stands out among metadata repositories because of its unique ability to provide communication about changes to the University’s data end-users. Built into KN is help for understanding the impact of migrating from a mainframe system to a cloud-based system, and how to prepare for the changes.

Data lineage maps illustrate the parallels between the old and new systems. Interactive diagrams invite users to select objects and artifacts to understand the relationships between them. KN explains the new concepts and definitions users should know to work effectively within the new system. Users can view high-level conceptual models as well as technical metadata about tables and columns.

For example, users can see that the legacy table called


is related to


in the new system, or that 14 columns from legacy


relate to columns in the new


and that there is a brand-new, yet related table, called



The Knowledge Navigator change management tool

Data Lineage and Impact Analysis

KN identifies affected tables and columns in the reporting operational data store, as well as affected business intelligence reports. In the screenshot below, all source tables are displayed for the Academic Personnel Appointment Report.

An Academic Personnel Appointment Report in Knowledge Navigator

The report dependency data can be exported to a CSV, and then compared with the report traffic information to do report impact analysis:

An impact analysis of reports via the metadata repository

Looking ahead, on the heels of the HR/Payroll Modernization (HRPM) program, the University will replace its equally outmoded finance system. The Finance Business Transformation (FBT) program promises to be even larger and more complex than HRPM, and we are already working with FBT leadership to plan the data that will go into KN to facilitate impact analysis and user guidance throughout the project.

In addition, KN is used by internal project teams to document metadata used in the development process and not accessible to the general public, such as internal glossaries and databases, internal notes and links, and source-to-target mappings of databases, APIs, and other data sources. 

Likewise, new data buildout projects benefit from the exposed source-to-target mapping and data transformations. Developers and testers save time and effort through ready access to these critical underlying details.

The data relationships in impact analysis

Using all the impact relationships, we can easily query Neo4j to show us which tables have the most dependencies, like you can see below:

Learn about the advantages of a metadata repository backed by a graph database in this UW case study


Knowledge Navigator has become an essential tool for data users and metadata repository managers to understand the meaning, usage and impact of data and business concepts at the University of Washington. Using Neo4j as our database, we are well-positioned to expand its capabilities to include the metadata necessary to build it into a powerful enterprise tool.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

graph database,metadata

Published at DZone with permission of Pieter Visser, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}