You Should Be Automating Your Data Flow Map
Mapping data for GDPR compliance is difficult and ripe for errors — automating the hard parts is the answer to protecting your customers and your company.
Join the DZone community and get the full member experience.Join For Free
Mapping and cataloging personal information collected from users is time-consuming. It is error-prone and relies on hunting down information from multiple departments. For many teams, creating an accurate data flow map will be the hardest part of completing GDPR Article 35's data privacy impact assessment (DPIA) or any privacy impact assessment (PIA).
Even for smaller businesses with limited departments and fewer software offerings, determining how data exists and how it moves can be a challenge. The same goes for adhering to the GDPR's Article 30, where controllers are expected to keep records of data collection and processing.
The easiest way to map personal data in your business, whether it is PI, PII, or any of the variations of user data, is by automating as much of the process as possible. Preparing for GDPR compliance, implementing a privacy information management system (PIMS) like ISO27701, or working toward future privacy regulation can all be made easier by using automated data flow mapping.
Automation Should Be at the Core of Your Processes
Asking engineers and department heads to self-assess, and to do so regularly, will undoubtedly lead to an incomplete view of the personal data you store. Even teams that use a privacy by design approach will suffer from human error. Marketing will add a new form or tracking cookie without adding it to the consent list. New features will ship that use an unchecked API. In fact, assessing vendor risk is one of the hardest and most important parts of mapping data flows.
This means you, or the data protection officer (DPO) at your organization, will need regular check-ins with each department and trust that all data is accounted for and documented. You will need to know that all actions related to data usage go through you before it goes live. That is a big ask, even for teams with the best of intentions.
Automation can assist in mapping the data — not only the first time a piece of data is introduced into your organization, but consistently throughout its lifecycle. This means the data protection officer on your team can regularly monitor new third-party vendors and data processors, changes to the codebase of your apps that introduce new types of personal information, and even receive alerts if anything unexpected, like a shadow API, shows up throughout the organization.
Instead of the DPO relying on teams to report in with data decisions they can receive automated reports, triage incoming notifications, and spend less time chasing individual reporting. This means more time can be spent focusing on meeting regulatory requirements.
How Automated Data Mapping Works
Many companies say they map your data, but what they mean is that they manually go through and interview your teams — or worse, they give you a wizard-like tool, so you can go through and interview your team. This process is time-consuming, costly, and has the same potential for human error that we mentioned earlier.
Truly automated systems work by scanning codebases or running on part of the infrastructure, like in a gateway. They offer features that can:
- Look for types of data that might be personal information or sensitive information
- Identify the source of the collection
- Locate where the data is stored
- Assess what processing activities are performed, and determine if they match your data policies
- Determine where personal information moves locally within the org and if it moves outside to a third party
- Help identify privacy risks and security risks
Good automated data flow mapping tools (ADFM or ADF) organize this information into a dashboard for processing. The process can be broken down into three parts: discovery, tagging, and continuous monitoring.
An automated data flow mapping tool will move through your applications and identify any personal information it finds. This discovery process will record the type of data, where and in what system it was found, where it goes, and where it is stored. Automated discovery is a great way of finding third-party services and integrations that leak data. Once all the information is found, it is placed into an information inventory that can then be classified and mapped.
Tagging and Classification
Knowing personal data exists within part of your application isn't enough to be compliant. You need to tag it in a way that allows you to follow its movement through your organization, and even outside. Regulations, like the California Privacy Act (CPRA) amendment to the California Consumer Privacy Act (CCPA), require special treatment for personal information that is considered sensitive. This tagging process adds context to the data. An ADFM can assist in categorizing data, flagging sensitive information, and even identifying where it moves — both at a service level and potentially even a geographical level. The potential to know when information crosses borders is increasingly more important as the GDPR and regulations like it includes clauses that prohibit or limit moving data outside the regulation's borders.
While an automated data flow mapping tool shows immediate value when you first begin using it, the true value comes with its ability to continuously monitor for changes. With the right approach, it allows your team to receive instant notifications when new data is discovered or when existing data is used or transported in an unexpected way, and even generate regular accountability reports.
With enough information, the system can even alert you of increased vendor risk if it detects a third-party processor with known vulnerabilities or breaches. This provides real-time insights into how data is moving through your application and where it may be at risk.
With an automated system in place, the DPO or responsible member of your team can then review all data, adjust or add any tags if needed, and ensure it aligns with your company's privacy and data policies.
What Automated Data Flow Mapping Doesn't Do
Automated data mapping tools sound like the perfect solution. They remove the time-consuming parts of data management, but they can't do everything. Don't trust any solution that markets itself as "hands-off". No automated system can perfectly tell you where your data is going, how long it is stored, or even which regions it lives in. AI and machine learning can help improve classification and detection, but they only make informed assumptions. These tools should always include a review process.
These tools are one part of your data compliance toolchain. They can help export data flow maps and even assist in generating parts of your privacy or data protection documents, but they are not a replacement for dedicated team members with expertise in data and privacy compliance.
Beyond Manual Auditing
Data flow maps are not the complete solution to any compliance plan. That said, they demonstrate a commitment to data security and privacy that less-complete alternatives do not. Regulators look favorably on businesses that provide detailed data maps and processing documentation. One of the more challenging parts of an audit is proving to auditors that you are actually doing what you say you are. These maps, derived from actual code and data, can show auditors that your data practices match your data promises.
Automating the discovery and personal information tagging process makes your governance, risk, and compliance tasks less tedious and frees up your team's time to focus on using user data more responsibly. It also injects privacy by design principles directly into your software development lifecycle. By integrating tools like ADFMs into your processes, you move closer to a proactive approach to data privacy and management rather than a reactive one.
Published at DZone with permission of Mark Michon. See the original article here.
Opinions expressed by DZone contributors are their own.