GDPR: Top 5 Logging Best Practices One Should Follow
When logging data, be wary of exposing the Personally Identifiable Information (PII) of your users. Read on for best practices to remain GDPR compliant while logging.
Join the DZone community and get the full member experience.Join For Free
The rather broad definition of personal data in the GDPR requires paying special attention to log data. GDPR and personal data in web server logs is a popular topic in many GDPR fora. For example, IP addresses or cookies might be considered personal data. Consequently, such data must be stored only with the consent of customers for a limited time. It is highly recommended to anonymize personal data before you hand over the logs to any third-party to minimize risk. A good example is the anonymization of IP addresses before you send data to Google Analytics.
Note that cloud and SaaS providers can't take the full responsibility for data you send to them for storage or analytics. In the GDPR world, the service provider often has two roles. The provider typically acts as "Data Controller" for your personal data (name, address, e-mail, phone number, etc.). For your content, such as your logs, the role of "Data Processor" might be applicable, in which case you are the "Data Controller" and you are responsible for logs you send to the cloud service provider.
So what are the best practices that will help you win the GDPR fight?
1. Centralize Log Storage
Centralize your log storage. This lets you apply policies in one place. Centralizing logs reduces the complexity and risk of maintaining policies in multiple places. Most log management services support retention policies per data source. You should define a reasonable retention time for every log source.
2. Delete Local Logs From Your Servers (Periodically)
Duplicated data could create problems when enforcing policies. Therefore, you should make sure that logs stored in a central place are removed from local servers as soon as possible. Logrotate is a common tool used to delete logs periodically (weekly by default). A streams the logs to the centralized log storage in near-realtime.
3. Structure Your Logs
You can structure logs with parser rules in a log shipper configuration. Structured logs make it easier to mask or anonymize sensitive data as we point out in the next step. Wherever possible, applications should log directly in a structured format like JSON. Using a structured log format saves human time needed to create parser rules, as well as CPU cycles for processing .
4. Anonymize Sensitive Data Fields in Logs
5. Encrypted Logs in Transit
Use only encrypted channels to transmit log data to a central storage. Logs are often shipped unencrypted with Syslog/UDP for performance reasons. That is bad practice. Do not do that. Configure your syslog servers for TLS connections. If you use Elastic Stack, secure Elasticsearch and check X-Pack alternatives.
Some of the above are general logging best practices one should follow. With the arrival of the GDPR, it becomes essential to follow them in order to protect your organization from potential legal issues. Furthermore, if you are storing European data on servers outside of the EU, you are effectively exporting PII (Personally Identifiable Information). European law does not allow the exportation of users' "personally identifiable information" unless companies can demonstrate they will protect European users' privacy and data. Thus, if you are shipping your logs to a log management service, using a logging service in the EU is another best practice to consider.
Published at DZone with permission of Stefan Thies, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.