The Future of Automated Data Lineage in 2021
With the ability to fully understand how data flows from one place to another, data lineage allows business processes to become more efficient and focused.
Join the DZone community and get the full member experience.Join For Free
As 2021 is now upon us (finally!), businesses are gearing up their strategy based on learnings from the past year. While insights help inform future plans, such as where to place budget and effort, there is one essential tool that each company should have at its disposal. If you’ve read the title, this shouldn’t come to you as such a surprise. We’re speaking about automated data lineage. With the ability to fully understand how data flows from one place to another, data lineage allows business processes to become more efficient and focused.
Data Lineage is Like Oil
In the webinar titled, 'The Essential Guide to Data Lineage in 2021,' Malcolm Chisholm, an expert in the fields of data management and data governance, shares his predictions for the coming year. To kick off the talk, he compares data lineage pathways to an oil refinery (one of our favorite analogies). Without our understanding of what is flowing through the pipes, we can’t determine how hot the oil is, it’s pressure levels, or even where it is going. Data lineage is thought to be the same. If companies don’t have a handle on exactly the data that is flowing between systems, they won’t be able to explain numbers that end up in a report. Malcolm Chisholm states that "data lineage is not just an arrow between two boxes, it’s a good deal more complicated than that." The process requires knowledge of the data that the company has acquired an understanding of how it was stored or any obstacles that it encountered along the way. Additionally, ETL tools are more than just data movement, there is actually logic happening inside of them. With this component, you can understand data lineage overall.
Types of Data Lineage and Benefit to the Company
When it comes to data lineage, there are two main types: horizontal and vertical. Horizontal is the more high level which shows the flow between systems. By utilizing the horizontal data lineage, you can view the whole picture but the only problem is that the detail only goes so far. If you would like a more technical view, you can utilize vertical data lineage. This is drilled down so you can see exactly what is happening and the specific transformations. Data lineage needs to satisfy these two levels so we should utilize both of these views. With data lineage, we must track both the movement and take logic into account.
In a nutshell, data lineage replaces the way in which information is shared between people and departments. Additionally, data lineage pathways represent business processes since the processing happens within the lineage itself. With increased competition in the world, data lineage pathways can help guide a company’s strategy. If the company fully understands how to deal with the overall data pipelines, it can properly address data governance, data quality, and data cleansing.
According to Chisholm, data lineage is essential today because it helps businesses strategically address concerns around reorganizing and re-engineering processes to be much more efficient.
Uses of Data Lineage in 2021 and Beyond
The first way in which companies can utilize data lineage is when they are planning a migration to the cloud. A major shift we will see in 2021 is that businesses will begin to move their information out of data centers or 'on-premise' and into the cloud. In order to achieve a smooth migration, the flow of data has to be correctly replicated. This means that companies will need to have a clear understanding of their existing data lineage. To prepare for a clean migration, companies must sift through their data and identify dead ends, or in other words, confirm 'what is taking up dead weight.'
When a BI team wants to achieve assurance of integrity in their reports, they can turn to data lineage. They must be able to confirm the accuracy of their numbers so they can provide the business with a convincing explanation of what is going on in the report. Additionally, this needs to be done in a reasonable amount of time. With data lineage, they can check upstream and understand where the error is and which datasets it has affected. Since data lineage combines various data streams into one platform, BI professionals can pinpoint certain issues or obstacles in the flow. This helps account for discrepancies, enables stronger insights, and provides concrete evidence to support their reports.
Along the same lines as assurance, impact analysis can be easily achieved through automated data lineage. Without an understanding of the implications, Chisholm thoughtfully stated, “you make the change and wait to see if anyone screams.” Data lineage allows you to understand what downstream technical objects will be affected before you make any alterations. The same thought goes for broken ETL. Once you make a change upstream and the ETL breaks, data lineage helps you investigate and discern exactly what happened. Once you understand where the issue is located, you can determine how to fix it.
In order to maintain compliance, data lineage is necessary. Whether a company must comply with GDPR, CPRA, BCBS-239 or simply to gain control of their PII, they must rely on accurate data lineage. Companies can see which processes are involved when they have a clear understanding of their data flow. This allows them to track their data in a more meaningful way to achieve data governance. When it comes to PII, this is especially important. Companies must understand where each new source of data came from so it can be classified and identified accordingly. You can also see which reports are receiving this data, which is essential for compliance. Data lineage makes sure that all information is saved properly and without any discrepancies.
Chisholm wraps up the webinar by summarizing the main advantages of automated data lineage. Through automation, companies can better scale and understand issues quickly. This leads to more detailed complexity and accuracy. Since data lineage is not simply data movement, this tool allows businesses to fully comprehend their data’s flow and any implications that a change or migration will make.
Published at DZone with permission of Or Hillel. See the original article here.
Opinions expressed by DZone contributors are their own.