DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • SQL Dynamic Data Masking for Privacy and Compliance
  • Customer 360: Fraud Detection in Fintech With PySpark and ML
  • Mastering Advanced Aggregations in Spark SQL
  • Unmasking Entity-Based Data Masking: Best Practices 2025

Trending

  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  • What Is Plagiarism? How to Avoid It and Cite Sources
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Infrastructure as Code (IaC) Beyond the Basics
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Data Masking: Static vs Dynamic

Data Masking: Static vs Dynamic

In this article, we'll focus on the mechanics of data masking and gloss over a massive issue: data classification -- knowing who can access what data.

By 
Max Tardiveau user avatar
Max Tardiveau
·
Sep. 08, 22 · Analysis
Likes (2)
Comment
Save
Tweet
Share
5.4K Views

Join the DZone community and get the full member experience.

Join For Free

The problem of data masking comes up surprisingly often in the world of IT.  Any time you need to share some potentially sensitive data, you may need to hide, obfuscate, randomize, or otherwise dissimulate some of that data --  we'll call that the secret data.

In this article, we'll focus on the mechanics of data masking and gloss over a massive issue: data classification -- knowing who can access what data. Data classification is a whole different problem, especially in organizations with huge amounts of sensitive data. I'll refer you to a different article that touches on this topic. For the rest of this article, we'll assume that this problem has been solved and that we know who can access what data. The question is -- how do we hide the secret data?

Data masking is not just for databases -- it can be applied to documents, spreadsheets, and so on, but here we'll focus on databases.

There are many ways to do data masking, but in general, they can be divided into two categories, each one with its own upsides and downsides.

What is Static Masking

Static masking is the simplest solution. Given a database that contains some secret data, you copy that database and edit the copy to mask whatever data needs to be masked. You can then provide the copy to the client, and they can do whatever they want with it.

Of course, this may not be a trivial process for a large data set. Imagine a relational database with thousands of tables and billions of rows (or more). But there are some (expensive) tools that will help you with that task.

Advantages

It should be obvious that static masking is a very clean concept. It's the same idea as taking a pair of scissors and cutting out parts of a document. The secret data is not present, or at least not readable, in the copy, so there is no risk of leakage. The final user simply does not have the secret data.

For simple databases, you may not even need any tools: a few simple SQL scripts (or whatever language your database uses) might be enough.

Because the secret data is not present, you can give a physical copy of the masked database to the client and let them run it on their own machines.

Disadvantages

The duplication of the data can be a problem. It requires more storage and one more copy of the database floating around. This is not usually a problem if, for instance, you are releasing a database to the public; therefore, there will be only one version of the masked database.

But if different clients have different requirements, you may need to make many copies of the database, each one with a potentially different set of rules about which data is masked. And, of course, if you have different rules for different clients, you now have to worry about each client getting access only to their own custom version of the data set and not anyone else's. It can get challenging to track all that.

Another problem is that the copies are snapshots of the database and may need to be updated at regular intervals. Each time you do this is an opportunity for a mistake.

Finally, we live in the era of big data. Some data sets are truly enormous, and making and distributing a copy of such data sets can be a daunting proposition.

What is Dynamic Masking

Dynamic masking takes a different approach. Instead of making a copy of the data and changing the copy, the data is modified on the fly, as it is accessed, before it reaches the user, thereby providing each user of the same database with a potentially different view of the data. Note that this does not affect the database -- it only affects how the user sees the data.

This assumes that you control the database and that the client is accessing it through some sort of network. If the users controlled the database, they could easily bypass the masking.

Generally speaking, dynamic masking can be done either by the database itself or by a layer between the database server and the database client.

For instance, Microsoft SQL Server offers some dynamic data masking capabilities, which may be sufficient for many scenarios. PostgreSQL has the Anonymizer extension. Data masking in SQL Server it's a powerful feature, but it does have some limitations.

Some third-party solutions provide data masking outside of the database, but they typically rely on special drivers or special clients. A more generalized approach is based on proxy filtering, which relies on deep packet inspection and modification to mask data before it reaches the client.

Advantages

The biggest advantage of dynamic masking is that, in theory, it allows you to use just one database for everyone. This avoids most of the issues we identified earlier with static masking.

Dynamic data masking also means that you can update the data masking rules, typically on the fly, and restrict or broaden access to certain data for certain clients at any time. And masking can depend on more than just who the user is: it can also depend on their IP address, the time of day, or what DEFCON level we're at -- you get the picture.

Clients get access to new and updated data immediately, so data currency problems disappear.

Dynamic data masking implies that you are controlling the database. You can (and probably should) monitor what the clients are doing. This is critical for forensic analysis if there is a problem later on (think Cambridge Analytica). In some environments, it may even be possible to enforce data confidentiality contractually, as long as you keep a close eye on how the clients are using the database.

Disadvantages

Dynamic masking is potentially less secure since users are, in fact connecting to a database that contains the secret data. It turns out to be non-trivial to mask data reliably if the client accesses it using a sophisticated query language such as SQL. For instance, Microsoft specifically warns about this issue in their SQL Server data masking documentation. This can be managed by using query control if that's an option.

Dynamic masking can also be a more complex solution, with more moving parts. The more complex the solution, the more likely it is that something will go wrong.

Conclusion

As is so often the case, there is no perfect solution: there is only a series of trade-offs that need to be weighed against the requirements.

If your data set is of a manageable size (and that is very much a relative concept here), it may be practical for you to make a copy of your database and do the masking on the copy. If you're OK with the disadvantages, we have outlined, that's a great way to do it. Simple solutions are often the most secure.

But if it's impractical or undesirable to duplicate the data set, especially if you have multiple clients with multiple masking requirements, then dynamic masking may be your only realistic option. In that case, you'll have to consider whether the database can satisfy your requirements or whether a third-party solution is required. Even if you end up using the data masking capabilities provided by your database, you may still benefit from using a third-party tool to manage permissions and data classifications.

Big data Data masking Masking (Electronic Health Record)

Opinions expressed by DZone contributors are their own.

Related

  • SQL Dynamic Data Masking for Privacy and Compliance
  • Customer 360: Fraud Detection in Fintech With PySpark and ML
  • Mastering Advanced Aggregations in Spark SQL
  • Unmasking Entity-Based Data Masking: Best Practices 2025

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!