# Adding Laplace or Gaussian Noise to Database for Privacy

# Adding Laplace or Gaussian Noise to Database for Privacy

### In the previous two posts we looked at a randomization scheme for protecting the privacy of a binary response. This post will look briefly at adding noise to...

Join the DZone community and get the full member experience.

Join For FreeThe idea of differential privacy is to guarantee bounds on how much information may be revealed by someone's participation in a database. These bounds are described by two numbers: ε (epsilon) and δ (delta). We're primarily interested in the multiplicative bound described by ε. This number is roughly the number of bits of information an analyst might gain regarding an individual.

The multiplicative bound is exp(ε) and so ε, the natural log of the multiplicative bound, would be the information measure, though technically in *nats* rather than *bits* since we're using natural logs rather than logs base 2.

In previous posts, we looked at a randomization scheme for protecting the privacy of a binary response. This post will look briefly at adding noise to continuous or unbounded data. I like to keep the posts here fairly short, but this topic is fairly technical. To keep it short, I'll omit some of the details and give more of an intuitive overview.

## Differential Privacy

The δ term is added to the multiplicative bound. Ideally, δ is 0, that is, we'd prefer (ε, 0)-differential privacy, but sometimes we have to settle for (ε, δ)-differential privacy. Roughly speaking, the δ term represents the possibility that a few individuals may stand to lose more privacy than the rest, that the multiplicative bound doesn't apply to everyone. If δ is very small, this risk is very small.

## Laplace Mechanism

The Laplace distribution is also known as the double exponential distribution because its distribution function looks like the exponential distribution function with a copy reflected about the *y*-axis; these two exponential curves join at the origin to create a sort of circus tent shape. The absolute value of a Laplace random variable is an exponential random variable.

Why are we interested in this particular distribution? Because we're interested in multiplicative bounds, and so it's not too surprising that exponential distributions might make the calculations work out because of the way the exponential scales multiplicatively.

The Laplace mechanism adds Laplacian-distributed noise to a function. If Δ *f* is the sensitivity of a function *f*, a measure of how revealing the function might be, then adding Laplace noise with scale Δ *f*/ε preserves (ε 0)-differential privacy.

Technically, Δ *f* is the *l*_{1} sensitivity. We need this detail because the results for Gaussian noise involve *l*_{2} sensitivity. This is just a matter of whether we measure sensitivity by the *l*_{1} (sum of absolute values) norm or the *l*_{2} (root sum of squares) norm.

## Gaussian Mechanism

The Gaussian mechanism protects privacy by adding randomness with a more familiar normal (Gaussian) distribution. Here, the results are a little messier. Let ε be strictly between 0 and 1 and pick δ > 0. Then, the Gaussian mechanism is (ε, δ) — differential privacy provided the scale of the Gaussian noise satisfies:

It's not surprising that the *l*_{2} norm appears in this context since the normal distribution and *l*_{2} norm are closely related. It's also not surprising that a δ term appears; the Laplace distribution is ideally suited to multiplicative bounds but the normal distribution is not.

Published at DZone with permission of John Cook , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

{{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}