Case Study on Data Mining with Privacy Preservation | Original Article
The primary issue examined in this research is that privacy-preserving data mining (PPDM) research has produced theoretical solutions and many peer-reviewed articles claiming to solve the problem. In order to gain any real benefit from the theoretical solutions, practitioners must attempt to convert that theory into practical software- and hardware- based solutions. This article begins with a review of data mining, privacy, and privacy-preserving data mining. It then reviews and analyzes the barriers that prevent widespread adoption of privacy-preserving data mining solutions. The article concludes by presenting recommendations and ideas for future work. Our proposal has two main advantages. Firstly, as also suggested by our experimental results the perturbed data set maintains the same or very similar patterns as the original data set, as well as the correlations among attributes. While there are some noise addition techniques that maintain the statistical parameters of the data set, to the best of our knowledge this is the first comprehensive technique that preserves the patterns and thus removes the so called Data Mining Bias from the perturbed data set. Secondly, re-identification of the original records directly depends on the amount of noise added, and in general can be made arbitrarily hard, while still preserving the original patterns in the data set. The only exception to this is the case when an intruder knows enough about the record to learn the confidential class value by applying the classifier. However, this is always possible, even when the original record has not been used in the training data set. In other words, providing that enough noise is added, our technique makes the records from the training set as safe as any other previously unseen records of the same kind.