Data analysis revolves around the central goal of aggregating metrics. Aggregation must be done secretly when data points match personally identifiable information, such as specific user logs or activities. Differential privacy (DP) is a method that constrains the impact of each data point on the computation’s conclusion. Therefore, it has become the most recognized approach to individual privacy.
Although differentially private algorithms are theoretically possible, in practice they are often less efficient and accurate than their non-private counterparts. In particular, the differential privacy requirement is a worst-case type of requirement. It enforces that the privacy requirement be met for two neighboring data sets, regardless of how they were constructed, even if they are not sampled from any distribution, leading to a significant loss of precision. Which means that “unlikely points” that have a major impact on aggregation should be considered in the privacy analysis.
Recent research from Google and Tel Aviv University provides a generic framework for preliminary processing of the data to ensure compatibility. When the data is known to be “friendly”, the private aggregation stage can be performed without considering potentially influential “unfriendly” elements. Because the aggregation stage is no longer constrained to operate in the original “worst-case” configuration, the proposed method has the potential to significantly reduce the amount of noise introduced at this stage.
Initially, researchers formally define the conditions under which a data set can be considered friendly. These conditions will vary depending on the type of aggregation required, but will always include data sets for which the sensitivity of the aggregate is low. For example, if the sum is average, “friendly” should include compact data sets.
The team developed the FriendlyCore filter that reliably extracts a sizeable friendly subset (the core) of the input. The algorithm is designed to meet a couple of criteria:
- You need to remove outliers to retain only close to many elements in the core.
- For close data sets that differ by only one item, the filter returns all items except and with nearly equal probability. Kernels derived from these nearby databases can be joined cooperatively.
The team then created the Friendly DP algorithm, which, by introducing less noise into the total, meets a less stringent definition of privacy. By applying a benevolent DP aggregation method to the kernel generated by a filter that satisfies the aforementioned conditions, the team showed that the resulting composition is differentially private in the conventional sense. Clustering and covariance matrix discovery from a Gaussian distribution are other uses of this aggregation approach.
The researchers used the Zero-Concentrated Differential Privacy (zCDP) model to test the effectiveness of FriendlyCore-based algorithms. 800 samples were taken from a Gaussian distribution with an unknown mean through its steps. As a benchmark, the researchers looked at how it compared to the CoinPress algorithm. CoinPress, unlike FriendlyCore, requires a norm of the mean upper bound of R. The proposed method is independent of the upper bound and dimension parameters and therefore outperforms CoinPress.
The team also evaluated the efficiency of their proprietary k-means clustering technology by comparing it to another locality-sensitive recursive hashing technique, LSH clustering. Each experiment was repeated 30 times. FriendlyCore frequently crashes and produces inaccurate results for small values of n (the number of samples in the mix). However, as n grows, the proposed technique is more likely to succeed (as the tuples created get closer to each other), producing very accurate results, while LSH clustering lags behind. Even without clear clustering, FriendlyCore works well on large data sets.
review the Paper and Reference article. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.