Group Members

Maria Isabel German
Achu Mary Philip
Kamaldeep Kaur

Introduction

In an era where data privacy is a growing concern, k-anonymity plays a crucial role in protecting sensitive information while preserving data utility. A dataset is k-anonymous if quasi-identifiers for each person in the dataset are identical to at least k – 1 other people also in the dataset ^[1], making it more difficult for attackers to re-identify individuals based on quasi-identifiers. However, achieving optimal k-anonymization requires effective generalization strategies, which is where different algorithms come into play.

This project presents a comparative analysis of two k-anonymization approaches: MinGen Brute Force and the Greedy Algorithm. By evaluating their effectiveness, execution time, and certainty metric, we explore their trade-offs and determine which method best balances privacy protection and data usability.

Why This Study Matters

Choosing the right anonymization approach is critical in real-world applications, such as healthcare, finance, marketing^[2] etc., where data must be shared while protecting individuals’ privacy. A poorly implemented k-anonymization strategy can either over-suppress data, reducing its usefulness, or under-protect it, leaving individuals vulnerable to re-identification attacks.

This study aims to answer:

How do MinGen Brute Force and Greedy approaches compare in terms of execution time and certainty?
Does increasing k affect suppression differently in both algorithms?
How does dataset size impact execution time?

Teaching Aide and Structure

The teaching aide for this project is designed using PowerPoint presentation, providing:

Concept Overview
Algorithm Workflows
Comparative Analysis
Key Observations

While this project does not involve a live coding execution, all experiments were conducted in Jupyter Notebook, which we highly recommend for running and analyzing k-anonymization tasks due to its ease of execution and visualization. The focus here is not on writing complex code but rather on understanding the trade-offs between the two approaches.

What You Will Learn

By engaging with this teaching aide, you will:

Understand k-anonymity and its significance in privacy protection.
See how MinGen Brute Force and Greedy approaches work in anonymizing data.
Analyze execution time and certainty metric for different dataset sizes and k-values.
Compare the efficiency and trade-offs of both algorithms.

Conclusion

This comparative study simplifies the complexities of k-anonymization into an accessible framework, helping you understand when to choose MinGen Brute Force vs. when to opt for Greedy. By keeping the implementation simple and focusing on comparative insights, this project serves as a foundational guide for exploring data anonymization techniques. For those interested in running similar analyses, Jupyter Notebook is an excellent tool to test different generalization strategies, tweak k-values, and observe their impact on execution time and certainty metric. At the end of this study, you’ll have a clear understanding of the strengths and weaknesses of each algorithm, helping you make informed decisions about privacy-preserving data processing.

Ready to dive in? Let’s explore K-Anonymization together!

References

Walkthrough: K-Anonymization: A Comparative Analysis of MinGen and Greedy Algorithms

Leave a comment

Cancel reply