GDPR, Database Backups, and the Right to Be Forgotten
GDPR, Database Backups, and the Right to Be Forgotten
Most of the GDPR, including the right to be forgotten, is not a reason for panic. However, parts of the GDPR, specifically the right to be forgotten, present us with challenges.
Join the DZone community and get the full member experience.Join For Free
Databases are better when they can run themselves. CockroachDB is a SQL database that automates scaling and recovery. Check it out here.
I've said it before, but it bears repeating: there is no cause for any kind of panic when it comes to the GDPR. None. There are, however, a number of concerns. One of those concerns is, well, concerning. How does the right to be forgotten within the GDPR impact database backups? Let's discuss what we know.
The Right to Erasure
Each of the articles within the GDPR lays out a topic. Article 17 is pretty darned clear about the topic:
Right to erasure ('right to be forgotten')
Basically, the individuals — also known as the data subjects, also known as natural persons, in short, people — can request that you remove their data from your system. The first sentence lays out the gist of the idea quite well:
The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay...
Sure, there are exceptions, and it's worth reading Article 17 to understand those, but that's not the point of this discussion. The question is, What about backups? It's easy to run a
DELETE statement. Heck, it's easy to put in referential integrity such that you can do a cascading delete if you so desire (I don't, but different discussion, again). When you run a
DELETE statement today, it doesn't remove any data from that backup that you took last night.
Nothing within Article 17 talks about backups, offsite storage, readable secondaries, log shipping, or any of that stuff. In fact, there's nothing technical there at all. No help to tell you what to do about this question.
Now, each article has expansions that further detail the information within the article called recitals. In the case of the right to be forgotten, there are two: Recital 55 and Recital 66. Recital 55 has nothing for us at all. Recital 66 does talk about the fact that because we're dealing in an online world — the best available technical means should be used to deal with the fact that a person's data may be in more than one location and we'll need to clean that up.
And that's it.
In fact, you can search the GDPR and not find the word backup. You can search the GDPR and find the word restore exactly once, in Article 32 which talks about:
...the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident;...
...basically saying that you better be able to restore your system after an outage (and that's another discussion for another day).
Now what? Well, let's look at some of the foundational law for the GDPR, the Information Commissioner's Office in the UK.
Information Commissioner's Office
The ICO is largely dealing with laws of the UK. However, some of those laws provided a lot of the basis and thought for the GDPR. In preparation for dealing with the GDPR, the ICO has a lot of information published; for example, a guide to the right to be forgotten. No, don't look. It doesn't mention backups, either.
The ICO in support of the Data Privacy Act (DPA) talks about a bunch of scenarios, including the need to protect database backups with encryption. Also, the inability to restore data is considered a breach of the DPA and probably the GDPR (see Part 5). We do, however, find some guidance around backups here:
...When data is deleted is it rarely removed entirely from the underlying storage media unless some additional steps are taken. In addition, a cloud provider is likely to have multiple copies of data stored in multiple locations to provide a more reliable service. This may include back-up tapes or other media not directly connected to the cloud. Copies of personal data stored in a cloud service may also be stored in other forms such as index structures.
74. The cloud customer must ensure that the cloud provider can delete all copies of personal data within a timescale that is in line with their own deletion schedule....
OK, not exactly detailed, but you get the core of the idea. You have to delete the data, in all its locations, but you have a set time to take care of this. That certainly sounds like I need to clean up my backups.
And that's all I can find there.
In fact, do some internet searches on your own. No one is quite sure what to do about the information stored on backups. There are a lot more questions than answers, so now what?
Dealing With Backups
So, upon receipt of a request to be erased, right after you delete the data from your production database and all the secondaries and the warehouse... sigh, you can restore all the backups, delete the data from those backups, retake the backups... double sigh.
Raise your hand if you want to do this? Neither do I. So, where does that leave us?
Let's go back to the GDPR. It very similar phrasing at multiple places in the Articles and the Recitals:
take reasonable steps, taking into account available technology and the means available to the controller, including technical measures
This is our defensible position. It's not reasonable (see that word) to expect us to go into offline data storage with the existing technical means, again, using their words, to delete data from that offline location.
Instead, we're going to build a process whereby we use the existing technology (assume you're talking to a lawyer and you must use the same phrases the same way, over and over) to ensure that the offline information doesn't become available online.
Yeah, cool. Sounds neat. Now, can you repeat that for me using T-SQL?
Sure. Let's build a method that basically has us store the request to be forgotten by storing the date of the request and the key values (all artificial, of course — no identifying information). Then, when we run a restore, prior to putting the database back online, we delete all the requests between when the backup was taken and the current time.
Assuming we document this process and detail why we're using it, we should be building what I've come to know as a defensible position. A defensible position, as it relates to the GDPR, is a full set of documents and processes that does everything reasonably feasible to meet the requirements of the GDPR. Writing it down as a process, and following it and publishing it, is the key to establishing this defensible position. Without all that work, you are facing serious trouble.
Most of the GDPR, including the right to be forgotten, is not a reason for panic. However, parts of the GDPR, specifically the right to be forgotten, present us with challenges. The key is to build a defensible position for your organization. Document your processes. Show you deal with the right to be forgotten. You'll probably find that, in most cases, the defensible position is just a common sense approach to data management that you should have been using anyway. It's just going to be a lot of work to implement all this.
Published at DZone with permission of Grant Fritchey , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.