The ''Right to be Forgotten'' and Data Masker for SQL Server
The ''Right to be Forgotten'' and Data Masker for SQL Server
Let's take a look at the ''Right to be Forgotten,'' which is one of the main features of the new data protection legislation across the globe.
Join the DZone community and get the full member experience.Join For Free
Compliant Database DevOps and the role of DevSecOps DevOps is becoming the new normal in application development, and DevSecOps is now entering the picture. By balancing the desire to release code faster with the need for the same code to be secure, it addresses increasing demands for data privacy. But what about the database? How can databases be included in both DevOps and DevSecOps? What additional measures should be considered to achieve truly compliant database DevOps? This whitepaper provides a valuable insight. Get the whitepaper
The right to be forgotten is one of the main features of the new data protection legislation across the globe. Under Article 17 of one such piece of legislation in Europe, the GDPR, individuals have the right to have personal data erased from all systems and also data the company may be storing about them. Similarly, in the USA, the California Consumer Privacy Act will require companies to respond to requests for data access, deletion, and portability within 45 days.
While there may be many pieces of unstructured data littering our systems that hold customer data, perhaps one of the hardest problems to tackle is the erasure of personal information from our structured data and how we also handle database backups. whenever we need to restore this information either back into Production or other environments.
In some cases, we may be able to simply remove a user from the database manually or by other means on an ad-hoc basis, but where we have important data referentially depending on these contacts e.g. order history, which we may be using for analytics or other important workflows, it is not possible to do so.
So how do we "redact" user information while maintaining database integrity, especially where we may restore sensitive data over the top of this redaction?
For the purposes of this article, I'm going to be using an example of my Forums-Redgate-Com database, specifically the Contacts table, as the target to tackle with the right to be forgotten:
Figure 1: The top 10 rows of the Contacts table.
There are several approaches we can take to tackling this problem. The first such solution is never restoring from a backup that was taken prior to receiving the request to be forgotten. This in itself is the least ideal workflow because it renders all prior backups useless and effectively means we can just delete them.
The second approach is to write a script and maintain this moving forward that we run against environments once the backup has been restored:
UPDATE dbo.Contacts SET ContactFullName = 'REDACTED', PhoneWork = 'REDACTED', PhoneMobile = 'REDACTED', Address1 = 'REDACTED', Address2 = 'REDACTED', Address3 = 'REDACTED', CountryCode = NULL, JoiningDate = NULL, ModifiedDate = NULL, Email = 'REDACTED', Photo = NULL, LinkedIn = 'REDACTED' WHERE ContactsID IN (ADD CUSTOMER_IDs HERE);
But this approach also has several drawbacks. We are only running a SQL script and in case of audit it is generally preferred to have some kind of log or report that demonstrates what steps were taken to comply with the request. It would also be beneficial to have something that we can programmatically keep updated instead of having to maintain a full "Right to be Forgotten" script in the future.
The first step we should take is to have a central list of which users have requested the right to be forgotten. By keeping a list of not-personally-identifiable identifiers, like Customer_IDs, we can easily update this as and when new requests come in. This could be in the form of a CSV file, a table in a database, or however else we can securely store it. For this example, I have created a simple table on a separate database:
Figure 2: The table of forgotten Customer_IDs.
From here, we can use Redgate's Data Masker for SQL Server to define a masking set, which will redact all users that appear in this table:
Figure 3: We have defined a new substitution rule for the Contacts table to carry out a similar operation to the code we wrote above.Figure 4: Next, we apply a SQL WHERE clause to fetch the redacted IDs from the relevant table. This could be from another database as in this example or could have been loaded into the database programmatically as part of the restore job.
Now we can update the RedactedUsers table, and if we go to run the Data Masker tool, it will carry out the redaction for us. In this case, let's try ContactID number 6:
Figure 5: Customer 6 has requested the right to be forgotten.Figure 6: After the run of the masking rules, we can see only the relevant row has been redacted.
Data Masker outputs reports as part of every masking operation, and we can see below that as part of this run, only one row was processed. As we know that there is only one row in the "Right to be Forgotten" table with 12 columns to be redacted, we know that the masking has completed successfully.
Figure 7: An extract of the Data Masker Masking Report
Now that we have the foundation of our redaction process, all that is left to do is automate it as part of a restore. In this case, we can script out the calling of the Data Masker command line and call it as part of the SQL Agent job:
start/wait "" "C:\Program Files\Red Gate\Data Masker for SQL Server 6\DataMasker.exe" "C:\Users\chris.unwin\Documents\Data Masker(SqlServer)\Masking Sets\RedactForgottenUsers.DMSMaskSet" -R -X
This means that once the database is restored to a state before we initially redacted the information, we can then immediately redact it again (but we can also just maintain the list of forgotten users. In this case, we've added rows 2 and 5 over time):
Figure 8: Result of the redaction after running the command line for Data Masker after restoring.
In summary, there are many ways we can handle actioning the "Right to be Forgotten" when it is requested, whether this is handling the update manually, scripting the change, or using Redgate's Data Masker for SQL Server to achieve the end result with the level of logging and reporting we would like to retain for audit purposes.
The goal is always to ensure that we are compliant with the necessary legislation, yet we can respond to requests we receive with the lowest possible overhead both to the data subject and our own teams. The solution outlined in this article is one such approach to achieve this.
Published at DZone with permission of Chris Unwin , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.