Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

GDPR: A Practical Guide For Developers, Part 1

DZone's Guide to

GDPR: A Practical Guide For Developers, Part 1

This will not be yet another "12 facts about GDPR" post, as they are often aimed at managers or legal people. Instead, I'll focus on what GDPR means for devs.

· Security Zone ·
Free Resource

Discover how to provide active runtime protection for your web applications from known and unknown vulnerabilities including Remote Code Execution Attacks.

You've probably heard about the GDPR: the new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it's most likely that there's already a process for getting your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe but have European customers. So that's most companies. I will not go into yet another "12 facts about GDPR" or "7 myths about GDPR" posts/whitepapers, as they are often aimed at managers or legal people. Instead, I'll focus on what GDPR means for developers.

Why am I qualified to do that? A few reasons - I was an advisor to the deputy prime minister of an EU country, and because of that I've been both exposed to, and myself wrote some, legislation. I'm familiar with the "legalese" and how the regulatory framework operates in general. I'm also a privacy advocate and I've been writing about GDPR-related stuff in the past, i.e. "before it was cool" (protecting sensitive data, the right to be forgotten). And finally, I'm currently working on a project that (among other things) aims to help with covering some GDPR aspects.

I'll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it's not unlikely that a less informed manager storms in during the late spring, realizing the GDPR is going to be in force tomorrow, asking "what should we do to get our system/website compliant?"

The rights of the user/client (referred to as "data subject" in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as "restricted" and don't touch it without further consent by the user), the right to data portability (the ability to export one's data in a machine-readable format), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them).

Additionally, the relevant basic principles are data minimization (one should not collect more data than necessary) and integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so "I'm small, GDPR does not concern me" is a myth.

It is important to know what "personal data" is. Basically, it's every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It's data that the user has explicitly provided, but also data that you have collected about them from either third parties or based on their activities on the site (what they've been looking at, what they've purchased, etc.).

Having said that, I'll list a number of features that will have to be implemented and some hints on how to do that.

  • "Forget me" - you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to clean up after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options - either make sure you allow nullable foreign keys (for example, an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It's a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blockchain/hash chain/tamper-evident data structure. With event sourcing, you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures, be careful what you put in there and avoid putting in the personal data of users. There is an option to use a chameleon hash function, but that's suboptimal. Overall, you must constantly think of how you can delete the personal data. And "our data model doesn't allow it" isn't an excuse. What about backups? Ideally, you should keep a separate table of forgotten user IDs, so that each time you restore a backup, you re-forget the forgotten users. This means the table should be in a separate database or have a separate backup/restore process.
  • Notify 3rd parties for erasure - deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data, too. So if you have sent personal data to, say, Salesforce, Hubspot, Twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your "forget me" endpoint should be exposed. Calling the third party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that's tricky, as Google doesn't have an API for removal, only a manual process. Fortunately, it's only about public profile pages that are crawlable by Google (and other search engines, okay...), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status so that it can be removed.
  • Restrict processing - in your admin panel where there's a list of users, there should be a button labeled "restrict processing." The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the back office staff, or publicly. You can implement that with a simple "restricted" flag in the user's table and a few if-clauses here and there.
  • Export data - there should be another button, "export data." When clicked, the user should receive all the data that you hold about them. What exactly that data is, depends on the particular use case. Usually, it's at least the data that you delete with the "forget me" functionality but may include additional data (e.g. the orders the user has made may not be deleted, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data exports can take a long time, so the button can trigger a background process, which would then notify the user via email when their data is ready (Twitter, for example, does that already - you can request all your tweets and you get them after a while). You don't need to implement an automated export, although it would be nice. It's sufficient to have a process in place to allow users to request their data, which can be a manual database-querying process.
  • Allow users to edit their profile - this seems an obvious rule, but it isn't always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a "login with Facebook" you may have fetched their name and address). Rule of thumb - all the fields in your "users" table should be editable via the UI. Technically, rectification can be done via a manual support process, but that's normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you've obtained the data from other sources (i.e. the user hasn't provided their details to you directly). In that case, there should still be a page where they can identify somehow (via email and/or SMS confirmation) and get access to the data about them.
  • Consent checkboxes - this is, in my opinion, the biggest change that the regulation brings. "I accept the terms and conditions" would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity, there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page - see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as "consent."
  • Re-request consent - if the consent users have given was not clear (e.g. if they simply agreed to terms and conditions), you'd have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
  • "See all my data" - this is very similar to the "Export" button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history - all the places that you've been to. It is a good implementation of the right to access (though Google is very far from perfect when privacy is concerned). This is not all about the right to access - you have to let unregistered users ask whether you have data about them, but that would be a more manual process. The ideal minimum would be to have a feature "check by email," where you check if you have data about a particular email.
  • Age checks - you should ask for the user's age, and if the user is a child (below 16), you should ask for parent permission. There's no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviously, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the "wishful thinking" aspects of the regulation).

Tune back in tomorrow when we'll cover some do's and don'ts developers should keep in mind when the GDPR kicks in. 

Find out how Waratek’s award-winning application security platform can improve the security of your new and legacy applications and platforms with no false positives, code changes or slowing your application.

Topics:
security ,gdpr ,data security ,data privacy

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}