Data Privacy From a Data Governance Standpoint
Data privacy is one of the most important components of data governance. This article explains the high-level implementation of data privacy in the big data ecosystem.
Join the DZone community and get the full member experience.Join For Free
Data governance (DG) is the process of managing the availability, usability, integrity, privacy, and security of the data in enterprise systems based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.
Data privacy is the branch of data management that deals with allowing only authorized users to access data in compliance with data protection laws, regulations, and general privacy best practices.
Ensuring data privacy involves setting access controls to protect information from unauthorized parties, getting consent from data subjects when necessary, and maintaining data integrity.
Three Main Levels of Data Privacy
Here, data access is controlled at the column level. One or more columns may be access controlled. There may be three broad categories.
- Internal-only data: Internal information is company data and should be protected with limited controls. Examples of internal data include various policies, company-wide memos, etc.
- Confidential data: Confidential data is kept within the team or group. This information may include pricing, marketing materials, or contact information.
- Restricted data: Restricted information is highly sensitive, and its use should be limited on a need-to-know basis. Restricted information includes trade secrets, personally identifiable information (PII), or health information. Any column or combination of columns that help identify an individual is/ are tagged as PII. Some organizations may want to implement more granular access control of PII data. In this scheme, PII data is further divided into subcategories. For example, first name and last name may belong to the name PII type; likewise, address, city, and zip may fall under address PII. These types and corresponding mapping must be included in data privacy policies.
Here, data access is controlled at the table level.
- Data Access Roles: Data storage, very commonly referred to as a table, may be controlled through data access roles. Often, a common data access role is created for common tables. And for specialized tables, the table owner may come up with a new role.
Here, data access is controlled at the row level.
- External Partner Data Access: Almost all organizations work with various external partners. Most of these external partners want their data to be access controlled. In certain situations, this access control is applied on the record level instead of column-level access. This type of access control is a bit complicated to implement. A value on a designated column is used to determine the role of the record. This type of requirement is very commonly seen in the financial as well as healthcare sectors.
Entities for Data Privacy Roles
There are two types of entities for which access can be provided for data privacy roles.
User ID: A user can request access to any of these roles. The access is provided for different environments separately. In most organizations, individual users are not allowed to access production servers. Hence, user ids are not allowed to have role-based access in the production environment.
Application ID: When a use case is created, an application ID is also created for each environment. Normally, this id is used to deploy an application on the server. Various organizations use different nomenclature for this id, e.g., faceless ID, service ID, use case ID, etc. Data privacy policies should include provision for access to data privacy roles for these ids as well. Unlike individual user IDs, these IDs should be able to get access to production roles also.
Data Privacy vs. Data Security
Both data privacy and data security are related to each other. They do have some overlapping responsibilities:
Access control: Preventing unauthorized access to and use of data is the primary concern of privacy and is possible only through security.
Data Integrity: Making sure the integrity of data is both a privacy and security concern.
But privacy and security have different areas to work on. Data security ensures the integrity, availability, and confidentiality of data. It may include data encryption, authentication, authorization mechanisms, defending against malicious attacks, etc.
Data privacy, on the contrary, focuses on individuals as well as use cases. Privacy rules determine the identification of PII columns, access mechanism to data from external partners as well as some of the internal tables too. These rules determine various roles for these accesses. These rules also determine the stakeholders of these roles. A workflow should be in place to be able to provide appropriate access rights by respective stakeholders.
Data Privacy Life Cycle
The data privacy life cycle starts with identifying PII columns, an identifier for each record in external partner data storage, then creating respective roles along with defining the structure of stakeholders and then ends with receiving access to requested roles and getting the same provisioned in the server. Important steps in this life cycle are discussed below.
The first step is to define roles.
- Column Level Access Roles: Define column-level access roles. If granular level PII access control is needed, then define PII roles. Identify PII columns and associate the columns with the respective role and maintain the mapping as metadata in a repository.
- Data Access: Define a common role for data storage or table. Some data owners may want to create a separate role for their tables. There should be flexibility for the same.
- External Partner Data Access: It can be as simple as creating a new data access role for external partner data storage. If record level restriction is needed, then the identifier column should be determined, and then the algorithm to get the role should be outlined. And then respective roles should be defined.
- Maintain column/ table and role mapping in a repository
- Prepare a list of stakeholders for each role
Creating roles: Once a role is defined, the same needs to be created. Normally, roles are created in the active directory.
Work Flow: There should be a mechanism where users come and request access. A workflow system is very useful for this purpose. As mentioned above, a mapping of roles and stakeholders is maintained while defining a role. The workflow system can refer to that mapping and send an approval request to the set of approvers.
Provisioning: Once approvals from all the stack holders are received, the access can be provisioned in the system. Here also, the active directory can be used to provision the access.
Once the user/ application access is provisioned, the rest is taken care of by data security. It is presumed that whenever data is requested, the security module checks the access level in the active directory before authorizing the request for the user/ application.
Opinions expressed by DZone contributors are their own.
Scaling Site Reliability Engineering (SRE) Teams the Right Way
Managing Data Residency, the Demo
Seven Steps To Deploy Kedro Pipelines on Amazon EMR
What Is mTLS? How To Implement It With Istio