Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

GDPR Violations Are Hiding in Your Database - Here's How to Find Them

DZone's Guide to

GDPR Violations Are Hiding in Your Database - Here's How to Find Them

There's no hiding from GDPR, even though some violations might be hiding in your database. This post will help you discover what and where they are.

· Security Zone ·
Free Resource

Discover how to provide active runtime protection for your web applications from known and unknown vulnerabilities including Remote Code Execution Attacks.

Recently, I attended the Microsoft Virtual Security Summit, and they posted this poll:

As you can see, 22% of the respondents believe that their organization is not impacted by the GDPR.

I think that many of that 22% have their head in the sand, something I've written about before. I've seen an uptick in conversations around GDPR recently. They go like this: "Hey, we just heard about GDPR, can you help us figure out what we need to do to be compliant?"

The short answer is "maybe." But with only two months before GDPR takes effect, it is unlikely you are going to catch everything. And it is going to be even harder to find data if you don't know where to look. That's why I am writing this today. Because GDPR violations are hiding in your database, and here's how to find them.

And even if the GDPR doesn't apply to you, this post will help you find Personally Identifiable Information (PII) that might be overlooked. I bet even the 22% above are concerned about PII data leaking out.

Data Discovery and Classification

The first step is to start building a process for data discovery and classification. I've written before about the tools Microsoft is making available to help, as well as the limitations these tools have right now. With the GDPR coming there is no better time to get started. But get started you must. The sooner, the better, as you are likely to discover thousands of columns that need classification, masking, and/or encryption.

The Two Datatypes You Must Check

There's a couple of column data types that aren't getting enough attention for the GDPR and/or PII. They are the XML and NVARCHAR(MAX) columns. These columns can be catch-alls for data, and likely contain details that may fall under the scope of GDPR. They are not being flagged by the Data Discovery and Classification tool by default. That's because the tools look at column names, and not the data inside of the columns. You need to be aware of this gap and include a manual check for these columns, to be safe.

How to Find XML Columns in SQL Server

Finding these columns inside your database is easy enough. Here's some sample code to help get you started.

SELECT SCHEMA_NAME(so.schema_id) AS [Schema],
OBJECT_NAME(sc.object_id) AS [Table],
sc.name AS [Column],
'SELECT ' + sc.name + ' FROM ' +
SCHEMA_NAME(so.schema_id) + '.' + 
OBJECT_NAME(sc.object_id) AS [SelectStmt]
FROM sys.columns sc
INNER JOIN sys.objects so ON sc.object_id = so.object_id
WHERE sc.max_length = -1 --varchar(max), nvarchar(max), varbinary(max), or xml 
AND so.type = 'U' --user table

This code snippet will help you find any columns in your database that are varchar(max), nvarchar(max), varbinary(max), or XML. The code is limited to only user tables. The code returns schema, table, and column names and builds a SELECT statement for you to run if desired.

Feel free to take it and modify as you see fit.

When I run it against my version of GalacticWorks it returns 14 columns:

Once we have our list we can get started on examining the data to determine if they contain any PII data and likely GDPR violations you didn't know about.

Parsing the Data

When it comes to parsing this data we have two main options: T-SQL and PowerShell. However, neither of those will help you with the varbinary(max) columns. For those columns, you will need to export the data and view with any tool that can open an image. I included varbinary(max) in the result set because images can contain PII data and therefore you should beware and classify those columns accordingly.

For the varchar(max) and nvarchar(max) columns, you can use T-SQL to easily search for strings. Here is but one example:

SELECT DocumentSummary 
FROM Production.Document
WHERE DocumentSummary LIKE '%SearchForSomething%'

For the XML columns you can also use T-SQL, here's how you can do that:

SELECT * 
FROM  
(SELECT cast (Diagram as nvarchar(max)) [XmlText]
 FROM Production.Illustration) a 
WHERE [XmlText] LIKE '%SearchForSomething%'

Yes, I could also use XQuery here, but, to be honest, XQuery is not the right solution anyway. No, if I was building this out for an enterprise I would use PowerShell to get this done. With PowerShell, I could have a list of keywords to check as well as a list of regex strings to look for patterns (credit card number formats, social insurance number formats, etc.) PowerShell would make this task a lot easier than T-SQL. And PowerShell would let me scale beyond just the one instance of SQL Server, too.

Summary

Life is dirty. So is your data. And that dirt is hiding everywhere. Don't be foolish enough to think you don't have PII data stuffed into a column somewhere. Don't think that because you are located in the US that the GDPR won't apply to you. Get in front of this situation now, before May arrives.

Take the time to do some data discovery. It will be worth your time.

Find out how Waratek’s award-winning application security platform can improve the security of your new and legacy applications and platforms with no false positives, code changes or slowing your application.

Topics:
gdpr ,security ,data security ,database security

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}