Data Fabrics and Knowledge Graphs — A Symbiotic Relationship
In this article, take a look at the symbiotic relationship between data fabrics and knowledge graphs.
Join the DZone community and get the full member experience.Join For Free
The data fabric notion is gaining credence throughout the analyst community, in much the same way knowledge graphs have done so for years. Both technologies link all relevant data for a specific business purpose, which is why the most successful companies in the world employ them.
Amazon’s knowledge graph retains metadata about its vast product array; Google’s captures data about an exhaustive list of web entities of interest. Lesser-known organizations regularly deploy these mechanisms for everything from comprehensive customer views to manufacturing processes.
Data fabrics have a unique, symbiotic relationship with the knowledge graph movement because they substantially streamline the processes to extract data from the myriad sources that populate these platforms. In turn, knowledge graphs provide some of the fundamental capabilities enabling data fabrics to accomplish this objective.
Consequently, it's immensely significant that data fabrics are regarded as the most mature means of harmonizing and integrating data. When fueled by knowledge graph technology, these fabrics create the optimal means of aligning data of all types for any singular business purpose.
Data Integration Maturity
Although there are several competing data fabric definitions (many of which hinge on the particular vendor disseminating them), nearly all of them specify a cohesive means of collecting, integrating, governing, and sharing data regardless of differences in type, format, technology, or location. Data fabrics are considered the most mature means of data integration because they organize the governance particulars of data quality, data lineage, metadata management, and the exchange of semi-structured, unstructured, and structured data—while offering a single access point to all data. They’re superior to alternative approaches involving (in ascending order of maturity):
Silos: Characterized by data trapped in individual databases and applications, silos require writing a new application for each business case requiring information from multiple sources.
Master Data Management and Data Warehouses: Although these approaches intend to support a single version of the truth, this capability is compromised by multiple warehouses and MDM domains across organizations. Plus, their relational technologies are unsuited for unstructured data and machine learning without leveraging costly data marts for specific business problems.
Data Lakes: These repositories collocate all data regardless of structure variation, yet intrinsically lack sustainable means of governance for data quality, metadata management, and traceability.
Backend Knowledge Graphs
Since the most effective data fabrics harmonize a pastiche of technologies and approaches, they also include knowledge graphs for taming the complexity of governing digital assets on the backend. These graphs are essential for governing content in terms of access management, data provenance, and data quality while unifying the terminology used to describe these assets. Data fabrics with a digital asset knowledge graph benefit from the latter’s knowledge of content in each database and application, across business lines, in an organization. These graph-aware repositories know everything about such data including the tables, columns, and data types contained in dbs, their schema, owners, and applications they’re used for, which machines they run on, and more. This fundamental knowledge is foundational for supporting data fabrics’ ability to harmonize and integrate data.
Front End Knowledge Graphs
In return, front end knowledge graphs — i.e. those described in the introduction for Google, Amazon, customer 360s, and other mission-critical purposes — benefit from data fabrics’ aptitude for transforming data. The chief benefit of this approach to harmonizing data is its singular, easily repeatable processing for ETL or ELT. This dimension of these holistic fabrics is critical for populating knowledge graphs with the diverse assortment of information required to perfect a business function, like LinkedIn’s graph that connects most of the world’s workforce to their past and present employers.
In fact, emergent developments in this space make knowledge graphs considerably easier to implement—while redoubling their enterprise value—with an entity modeling approach. Organizations simply define the key entities driving their business (patients in healthcare, customers in finance, etc.) then shape all information about them into simple event objects based on time. Such information typically includes things like interactions with patients or customers, their existing history, and other relevant factors. This way, all disparate data is aligned in a uniform shape supporting ad-hoc analytics over any range of sources united in a data fabric. This uniformity is also ideal for machine learning feature engineering.
Hand in Hand
Knowledge graphs and data fabrics are an optimal combination for integrating, analyzing, and creating informed action from data-driven technologies. Backend, digital asset knowledge graphs fortify the governance capabilities of data fabrics. These comprehensive frameworks simplify the transformation at the crux of integration efforts to consistently supply front end, business purpose knowledge graphs with the data varieties necessary to form concrete knowledge. Together, they empower the enterprise with relevant information from all their sources to optimize decision making and profitability from the data currency.
Opinions expressed by DZone contributors are their own.