differential privacy (DP) is an approach that enables data analysis and machine learning (ML) with a mathematical guarantee on the privacy of user data. DP quantifies the “privacy cost” of an algorithm, that is, the level of assurance that the algorithm’s output distribution for a given data set will not change significantly if data from a single user is added or removed. The algorithm is characterized by two parameters, ε and δ, where the smaller values of both indicate “more private”. There is a natural tension between the privacy budget (ε, δ) and the utility of the algorithm: a smaller privacy budget requires the output to be “noisier”, which often leads to lower utility. Therefore, a fundamental goal of DP is to achieve the highest possible utility for a desired privacy budget.
A key property of DP that often plays a central role in understanding privacy costs is that of composition, which reflects the net privacy cost of a combination of DP algorithms, viewed together as a single algorithm. A notable example is the differentially private stochastic gradient descent (DP-SGD) algorithm. This algorithm trains ML models in multiple iterations, each of which is differently private, and therefore requires an application of the DP composition property. A basic composition theorem in DP says that the privacy cost of a collection of algorithms is, at most, the sum of the privacy cost of each. However, in many cases this can be a gross overestimate, and various improved composition theorems give better estimates of the privacy cost of composition.
In 2019, we launched a open source library (in GitHub) to allow developers to use DP-based analytics techniques. Today, we announce the addition to this library of connect the dotsa new privacy accounting algorithm based on a novel approach to discretizing privacy loss distributions that’s a useful tool for understanding the privacy cost of composition. This algorithm is based on the article “Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions“, featured in PETS 2022. The main novelty of this accounting algorithm is that it uses an indirect approach to build more accurate discretizations of the privacy loss distributions. We find that Connect-the-Dots provides significant gains over other privacy accounting methods in the literature in terms of accuracy and execution time. This algorithm was also recently applied for the privacy accounting of DP-SGD in the training of advertisement prediction models.
Differential distributions of privacy and loss of privacy
A random algorithm is said to satisfy DP guarantees if its output “does not depend significantly” on any input in its training data set, mathematically quantified with parameters (ε, δ). For example, consider the motivational example of DP-SGD. When trained with SGD (not private), a neural network could, in principle, encode the entire training dataset within its weights, allowing some training examples to be reconstructed from a trained model. On the other hand, when training with DP-SGD, we have a formal guarantee that if one were able to reconstruct a training example with a non-trivial probability, one would also be able to reconstruct the same example even if it were not included in the set. of training data.
He hockey stick divergence, parameterized by ε, is a measure of distance between two probability distributions, as illustrated in the following figure. The privacy cost of most DP algorithms is dictated by the hockey stick divergence between two associated probability distributions P and Q. The algorithm satisfies DP with parameters (ε, δ), if the value of the divergence of the hockey stick for ε between P and Q is at most δ. The hockey stick divergence between (P, Q), denoted by δP||Q(ε) is in turn fully characterized by its associated privacy loss distribution, denoted by PLDP||Q.
Illustration of hockey stick divergence δP||Q(ε) between the distributions P and Q (left), which corresponds to the probability mass of P which is above emeQ, where ismewhat is an eme scaling of the probability mass of Q (right). |
The main advantage of working with PLD is that the compositions of the algorithms correspond to the convolution of the corresponding PLD. Taking advantage of this fact, priority work has designed efficient algorithms to calculate the PLD corresponding to the composition of individual algorithms simply by performing the convolution of the individual PLDs using the fast fourier transform algorithm.
However, one challenge when dealing with many PLDs is that they are often continuous distributions, making convolution operations intractable in practice. Therefore, researchers often apply various discretization approaches to approximate PLDs using equally spaced points. For example, the basic version of the Privacy Cube Algorithm assigns the probability mass of the interval between two discretization points completely to the upper bound of the interval.
Connect the dots: a new algorithm
Our new Connect-the-Dots algorithm provides a better way to discretize PLDs towards the goal of estimating hockey stick divergences. This approach works indirectly by first discretizing the hockey stick divergence function and then mapping it back to a discrete PLD supported on equally spaced points.
Illustration of high-level steps in the Connect-the-Dots algorithm. |
This approach is based on the notion of a “dominating the PLD”, namely PLDP’||Q’ dominates over the PLDP||Q if the divergence of the hockey stick from the first is greater than or equal to the divergence of the hockey stick from the second for all values of ε. The key property of dominant PLDs is that they remain dominant after compositions. Therefore, for the purposes of privacy accounting, it is sufficient to work with a dominant PLD, which gives us an upper bound on the exact cost of privacy.
Our main insight behind the Connect-the-Dots algorithm is a characterization of the discrete PLD, that is, that a PLD is compatible with a given finite set of values of ε if and only if the corresponding hockey stick divergence as a function of andme is linear between consecutive eme values. This allows us to discretize the hockey stick divergence simply by connecting the points to obtain a piecewise linear function that is exactly equal to the hockey stick divergence function at the given e.me values. Watch a more detailed explanation of the algorithm
Comparison of discretizations of hockey stick divergence by Connect-the-Dots vs Privacy Buckets. |
experimental evaluation
The DP-SGD algorithm involves a noise multiplier parameter, which controls the magnitude of noise added at each gradient step, and a sampling probability, which controls how many samples are included in each minibatch. We compared Connect-the-Dots to the algorithms listed below in the DP-SGD privacy accounting task with noise multiplier = 0.5, sampling probability = 0.2 x 10-4 yd = 10-8.
We plot the value of ε calculated by each of the algorithms against the number of composition steps and, in addition, we plot the execution time of the implementations. As shown in the diagrams below, the privacy accounting used by Renyi DP provides a rough estimate of privacy loss. However, when comparing approaches using PLD, we find that in this example, the Connect-the-Dots implementation achieves a more accurate estimation of privacy loss, with a runtime that is 5 times faster than Microsoft PRV. Accountant and > 200 times faster than the previous privacy cubes approach in the Google-DP library.
Left: Upper bounds on the privacy parameter ε for a variable number of DP-SGD steps, as returned by different algorithms (for fixed δ = 10-8). Right: Execution time of the different algorithms. |
Conclusion and future directions
This work proposes connect the dots, a new algorithm for calculating optimal privacy parameters for differentially private algorithm compositions. When tested in the DP-SGD task, we found that this algorithm gives more stringent estimates on privacy loss with significantly faster execution time.
So far, the library only supports the pessimistic estimation version of the Connect-the-Dots algorithm, which provides an upper bound on the privacy loss of DP algorithms. However the paper also introduces a variant of the algorithm that provides an “optimistic” estimate of the PLD, which can be used to derive lower limits about the privacy cost of DP algorithms (as long as they support a PLD in the “worst case”). The library currently supports optimistic estimates provided by the privacy cubes algorithm, and we hope to incorporate the Connect-the-Dots version as well.
Thanks
This work was carried out in collaboration with Vadym Doroshenko, Badih Ghazi, Ravi Kumar. We thank Galen Andrew, Stan Bashtavenko, Steve Chien, Christoph Dibak, Miguel Guevara, Peter Kairouz, Sasha Kulankhina, Stefan Mellem, Jodi Spacek, Yurii Sushko, and Andreas Terzis for their help.