DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Join us today at 1 PM EST: "3-Step Approach to Comprehensive Runtime Application Security"
Save your seat
  1. DZone
  2. Data Engineering
  3. Databases
  4. What Is Differential Privacy?

What Is Differential Privacy?

Differential privacy has been around for a while but continues to gather momentum with its solid mathematical backing. Let's investigate.

John Cook user avatar by
John Cook
·
Nov. 08, 18 · Presentation
Like (1)
Save
Tweet
Share
4.80K Views

Join the DZone community and get the full member experience.

Join For Free

Differential privacy is a strong form of privacy protection with a solid mathematical definition.

Roughly speaking, a query is differentially private if it makes little difference whether your information is included or not. This intuitive idea can be made precise as follows.

Queries and Algorithms

First of a differential privacy is something that applies to queries, not to databases. It could apply to a database if your query is to select everything in the database, but typically, you want to run far more specific queries. Differential privacy adds noise to protect privacy, and the broader the query, the more noise must be added. Typically, you want queries to be more narrow than "Tell me everything about everything."

Generally, the differential privacy literature speaks of algorithms rather than queries. The two are the same if you have a general idea of what a query is, but using the word algorithm emphasizes that the query need not be a simple SQL query.

Opt-Out and Neighboring Databases

To quantify the idea of "whether your information is included or not," we look at two versions of a database: D and D'. These differ by one row, which you can think of as the row containing your data. Our formalism is symmetric, so we do not specify which database, D or D ', is the one which contains your row and which is missing your row.

We should mention some fine print here. The paragraph above implicitly assumes that your database is just a simple table. In general, we can speak of neighboring databases. In the case of a more complex database design, the notion of what it means for two databases to be neighboring is more complex. Differential privacy can be defined whenever a notion of neighboring databases can be defined, so you could, for example, consider differential privacy for a non-relational ("No SQL") database. But for the simple case of a single table, two databases are neighboring if they differ by a row.

However, there's a subtlety regarding what it means even for two tables to "differ by one row." Unbound differential privacy assumes one row has been removed from one of the databases. Bound differential privacy assumes the two databases have the same number of rows, but the data in one row has been changed. In practice, the difference between bound and unbound differential privacy doesn't often matter.

Quantifying Difference

We said it "makes little difference" whether an individual's data is included or not. How do we quantify that statement? Differential privacy is a stochastic procedure, so we need to measure the difference in terms of probability.

Whenever mathematicians want to speak of two things being close together, we often use the "ε" as our symbol. While, in principle, ε could be large, the choice of symbol implies that you should think of it as a quantity you can make as small as you like (though there may be some cost that increases as ε decreases).

We're going to look at ratios of probabilities, so we want these ratios to be near 1. This means we'll work with eε = exp(ε) rather than ε itself. A convenient property that falls out of the Taylor series for the exponential function is that if ε is small then exp(ε) is approximate 1+ε. For example, exp(0.05) = 1.0513, and so if the ratio of two probabilities is bounded by exp(0.05), the probabilities are roughly within 5 percent of each other.

Definition of Differential Privacy

Now, we can finally state our definition. An algorithm A satisfies ε-differential privacy if for every t in the range of A,

Here, it is understood that 0/0 = 1, i.e. if an outcome has zero probability under both databases, differential privacy holds.

Related Posts

  • Adding noise to a database to protect privacy
  • HIPAA de-identification compliance
Differential privacy Database sql

Published at DZone with permission of John Cook, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Simulate Network Latency and Packet Drop In Linux
  • Playwright vs. Cypress: The King Is Dead, Long Live the King?
  • When AI Strengthens Good Old Chatbots: A Brief History of Conversational AI
  • How to Cut the Release Inspection Time From 4 Days to 4 Hours

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: