GDPR: A Practical Guide For Developers, Part 2
GDPR: A Practical Guide For Developers, Part 2
A lot of what has been written on the GDPR hasn't been written from a dev's point of view, so one developer wrote out some good do's and don'ts.
Join the DZone community and get the full member experience.Join For Free
Welcome back! If you missed Part 1, you can check it out here!
Now some "do's," which are mostly about the technical measures needed to protect personal data (outlined in article 32). They may be more "ops" than "dev," but often the application also has to be extended to support them. I've listed most of what I could think of in a previous post.
- Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google "X encrypted connections. Some databases need gossiping among the nodes - that should also be configured to use encryption.
- Encrypt the data at rest - this again depends on the database (some offer table-level encryption), but can also be done on the machine-level, e.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
- Encrypt your backups - kind of obvious.
- Implement pseudonymisation - the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some "pseudonym," so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a "pseudonymize" method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person.
- Protect data integrity - this is a very broad thing, and could simply mean "have authentication mechanisms for modifying data." But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I'm working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn't a strong guarantee, but it is at least something.
- Have your GDPR register of processing activities in something other than Excel - Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful - you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn't take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won't make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
- Log access to personal data - every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose. This does not follow directly from the provisions of the regulation, but it is kinda implied from the accountability principles. What about search results (or lists) that contain personal data about multiple subjects? My hunch is that simply logging "user X did a search for criteria Y" would suffice. But don't display too much personal data in lists - for example, see how Facebook makes you go through some hoops to get a person's birthday.
- Register all API consumers - you shouldn't allow anonymous API access to personal data. I'd say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don't think it is saying that - instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn't been processed without a valid reason).
Finally, some "Don'ts":
- Don't use data for purposes that the user hasn't agreed with - that's supposed to be the spirit of the regulation. If you want to expose a new API to a new type of client, or you want to use the data for some machine learning, or you decide to add ads to your site based on users behavior, or sell your database to a third party - think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you'd like consent.
- Don't log personal data - getting rid of the personal data from log files (especially if they are shipped to a third party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case.
- Don't put fields on the registration/profile form that you don't need - it's always tempting to just throw as many fields as the usability person/designer agrees with, but unless you absolutely need the data for delivering your service, you shouldn't collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
- Don't assume third parties are compliant - you are responsible if there's a data breach in one of the third parties (e.g. "processors") to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don't, raise a flag with management.
- Don't assume having ISO XXX makes you compliant - information security standards and even personal data standards are a good start and they will probably have 70% of what the regulation requires, but they are not sufficient - most of the things listed above are not covered in any of those standards.
Overall, the purpose of the regulation is to make you make conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow, and API calls with data protection in mind, then you shouldn't worry about the huge fines that the regulation prescribes - they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you'd have to somehow fit, but if you follow best practices, that shouldn't be an issue.
I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play "GDPR compliance" solution. GDPR is not just about the technical aspects listed above - it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It's not - it relies on a few basic principles that are in fact best practices anyway. Just don't ignore them.
Published at DZone with permission of Bozhidar Bozhanov , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.