DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. 6 Shades of Masking Your Data

6 Shades of Masking Your Data

Data masking has involved improving the management of sensitive data and synthesizing more realistic data. Here are some different approaches to doing it.

Toby Smyth user avatar by
Toby Smyth
·
Apr. 22, 17 · Opinion
Like (1)
Save
Tweet
Share
5.86K Views

Join the DZone community and get the full member experience.

Join For Free

Foundry is Redgate’s research and development division. We develop products and technologies for the Microsoft data platform. Each project progresses through Foundry’s four-stage product development process: Research, Concept, Prototype, and Beta. At each stage, the Foundry team is exploring the scope and potential for Redgate to develop a product. One of our projects, data masking, has seen us working to improve the management of sensitive data and synthesize more realistic data. To do this, we’ve talked to multiple customers and we’ve come up with six different approaches to data masking.

1. Keeping Track of Sensitive Data

The first step in preventing sensitive data from leaving your databases is knowing where that data is. Keeping track of what data is sensitive and where that data is can be very challenging.

Our first concept application uses Machine Learning in order to intelligently discover sensitive data in any and all columns in a SQL Server database. It uses a combination of scanning actual column values along with the names of SQL Objects to determine if a column contains sensitive data:

data masking 1

2. Masking Rules

In many cases, personal data can be desensitized by applying a couple of basic masking rules to each row. For example, columns with the name "Name" can have their values replaced with a random value chosen from a list of first names and surnames.

In this concept application, we show how a applying a few simple masking rules to a table can produce realistic and desensitized data:

data masking 2

3. Sensitive Data in Large Text Fields

Large text fields can pose a big problem when trying to mask sensitive data from a database. Often, these fields represent data that has meaning but no inherent structure, and it’s not sufficient to just "null" them out. However, these text fields can contain unique sensitive data that is hard to detect with traditional masking tools.

This concept application explores how natural language processing could be used to find and replace sensitive data in large amounts of text:

data masking 3

4. Generate Random Data From Production

Why risk masking data when you can just generate some realistic random data instead? SQL Data Generator is really useful for filling a test or dev database with random data. However, configuring it to create sensible and production-like data can be very time-consuming — especially with lots of tables and columns.

What if we could use example data from production to create a SQL Data Generator configuration in a few simple steps?

data masking 4

5. Data Distribution

One of the problems with generating data row by row is that you lose useful information about the entire data set. For example, if we choose a random value for each row, then the average of the generated data will be quite different the average data in production.

Our next concept application demonstrates how we could generate data that has a similar shape and distribution to the data in production:

data masking 5

6. Manage Test Data

Creating test data by hand and scripting all the dependent data is time-consuming and error-prone, and often misses edge cases. Testing using a representative subsample of data from production and keeping it up to date and free from sensitive data can be time-consuming and painful. What happens when your go-to test case closes their account?

Our final concept application looks at how we could help you create and manage test data:

data masking 6

Test data Masking (Electronic Health Record) Database sql Concept (generic programming) application Machine learning Production (computer science) NLP Test case

Published at DZone with permission of Toby Smyth, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Is DevOps Dead?
  • Required Knowledge To Pass AWS Certified Solutions Architect — Professional Exam
  • Strategies for Kubernetes Cluster Administrators: Understanding Pod Scheduling
  • How We Solved an OOM Issue in TiDB with GOMEMLIMIT

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: