The ABCs of Differential Privacy
MASTERING BASICS
Differential privacy (DP) is a rigorous mathematical framework that permits the analysis and manipulation of sensitive data while providing robust privacy guarantees.
DP is based on the premise that the inclusion or exclusion of a single individual should not significantly change the results of any analysis or query carried out on the dataset as a whole. In other words, the algorithm should come up with comparable findings when comparing these two sets of data, making it difficult to figure out anything distinctive about that individual. This safety keeps private information from getting out but still lets useful insights be drawn from the data.
Differential privacy initially appeared in the study "Differential Privacy" by Cynthia Dwork [1] while she was working at Microsoft Research.
Let's take a look at an example to better understand how differential privacy helps to protect data.
Examples of How Differential Privacy Safeguards Data
Example 1
In a study that looks at the link between social class and health results, researchers ask subjects for private information like where they live, how much money they have, and their medical background [2].
John, one of the participants, is worried that his personal information could get out and hurt his applications for life insurance or a mortgage. To make sure that John's worries are taken care of, the researchers can use differential privacy. This makes sure that any data that is shared won't reveal specific information about him. Different levels of privacy can be shown by John's "opt-out" situation, in which his data is left out of the study. This protects his anonymity because the analysis's results are not tied to any of his personal details.
Differential privacy seeks to protect privacy in the real world as if the data were being looked at in an opt-out situation. Since John's data is not part of the computation, the results regarding him can only be as accurate as the data available to everyone else.
A precise description of differential privacy requires formal mathematical language and technical concepts, but the basic concept is to protect the privacy of individuals by limiting the information that can be obtained about them from the released data, thereby ensuring that their sensitive information remains private.
Example 2
The U.S. Census Bureau used a differential privacy framework as a part of its disclosure avoidance strategy to strike a compromise between the data collection and reporting needs and the privacy concerns of the respondents. You can find more information about the confidentiality protection provided by the U.S. Census Bureau [here](https://mit-serc.pubpub.org/pub/differential-privacy-2020-us-census). Moreover, Garfinkel provides an explanation of how DP was utilized in the 2020 US Census data here.
Definition and key concepts
The meaning of "differential" within the realm of DP
The term "differential" privacy refers to its emphasis on the dissimilarity between the results produced by a privacy-preserving algorithm on two datasets that differ by just one individual's data.
Mechanism M
A mechanism M is a mathematical method or process that is used on the data to make sure privacy is maintained while still giving useful information.
Epsilon (ε)
ε is a privacy parameter that controls the level of privacy given by a differentially private mechanism. In other words, ε regulates how much the output of the mechanism can vary between two neighboring databases and measures how much privacy is lost when the mechanism is run on the database [3].
Stronger privacy guarantees are provided by a smaller ε, but the output may be less useful as a result [4]. ε __ controls the amount of noise added to the data and shows how much the output probability distribution can change when the data of a single person is altered.