Group Members
- Maria Isabel German
- Achu Mary Philip
- Kamaldeep Kaur
Introduction
In an era where data privacy is a growing concern, k-anonymity plays a crucial role in protecting sensitive information while preserving data utility. A dataset is k-anonymous if quasi-identifiers for each person in the dataset are identical to at least k – 1 other people also in the dataset [1], making it more difficult for attackers to re-identify individuals based on quasi-identifiers. However, achieving optimal k-anonymization requires effective generalization strategies, which is where different algorithms come into play.
This project presents a comparative analysis of two k-anonymization approaches: MinGen Brute Force and the Greedy Algorithm. By evaluating their effectiveness, execution time, and certainty metric, we explore their trade-offs and determine which method best balances privacy protection and data usability.
Why This Study Matters
Choosing the right anonymization approach is critical in real-world applications, such as healthcare, finance, marketing [2] etc., where data must be shared while protecting individuals’ privacy. A poorly implemented k-anonymization strategy can either over-suppress data, reducing its usefulness, or under-protect it, leaving individuals vulnerable to re-identification attacks.
This study aims to answer:
- How do MinGen Brute Force and Greedy approaches compare in terms of execution time and certainty?
- Does increasing k affect suppression differently in both algorithms?
- How does dataset size impact execution time?
Teaching Aide and Structure
The teaching aide for this project is designed using PowerPoint presentation, providing:
- Concept Overview
- Algorithm Workflows
- Comparative Analysis
- Key Observations
While this project does not involve a live coding execution, all experiments were conducted in Jupyter Notebook, which we highly recommend for running and analyzing k-anonymization tasks due to its ease of execution and visualization. The focus here is not on writing complex code but rather on understanding the trade-offs between the two approaches.
What You Will Learn
By engaging with this teaching aide, you will:
- Understand k-anonymity and its significance in privacy protection.
- See how MinGen Brute Force and Greedy approaches work in anonymizing data.
- Analyze execution time and certainty metric for different dataset sizes and k-values.
- Compare the efficiency and trade-offs of both algorithms.
Conclusion
This comparative study simplifies the complexities of k-anonymization into an accessible framework, helping you understand when to choose MinGen Brute Force vs. when to opt for Greedy. By keeping the implementation simple and focusing on comparative insights, this project serves as a foundational guide for exploring data anonymization techniques. For those interested in running similar analyses, Jupyter Notebook is an excellent tool to test different generalization strategies, tweak k-values, and observe their impact on execution time and certainty metric. At the end of this study, you’ll have a clear understanding of the strengths and weaknesses of each algorithm, helping you make informed decisions about privacy-preserving data processing.
Ready to dive in? Let’s explore K-Anonymization together!
References
- Google Cloud. “Compute k-Anonymity.” Google Cloud Documentation. https://cloud.google.com/sensitive-data-protection/docs/compute-k-anonymity#:~:text=K%2Danonymity%20is%20a%20property,people%20also%20in%20the%20dataset.
- K2View Blog. “What is K Anonymity and Why Data Pros Care” https://www.k2view.com/blog/what-is-k-anonymity#:~:text=K%20anonymity%20is%20a%20data%20anonymization%20technique%20that’s%20used%20to,single%20individual%20can%20be%20identified.