Operations Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
 QUICK SEARCH:   [advanced]


     


OPERATIONS RESEARCH,
Published online in Articles in Advance, August 17, 2009
DOI: 10.1287/opre.1090.0702
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Google Scholar
Right arrow Articles by Li, X.-B.
Right arrow Articles by Sarkar, S.

Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

Xiao-Bai Li, Sumit Sarkar

Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854
School of Management, University of Texas at Dallas, Richardson, Texas 75080

xiaobai_li{at}uml.edu
sumit{at}utdallas.edu

Data-mining techniques can be used not only to study collective behavior about customers, but also to discover private information about individuals. In this study, we demonstrate that decision trees, a popular classification technique for data mining, can be used to effectively reveal individuals' confidential data, even when the identities of the individuals are not present in the data. We propose a novel approach for organizations to protect confidential data from such a classification attack. The key components of this approach include a set of entropy-based measures to evaluate disclosure risks of individual records, an optimal pruning algorithm to identify high-risk records, and a pair of data-swapping procedures to reduce the disclosure risks. The proposed method provides the best trade-off between data utility and privacy protection against classification attacks. It can be applied to data with both numeric and categorical attributes. An experimental study on six real-world data sets shows that the proposed method is very effective in protecting privacy while enabling legitimate data mining and analysis.

Subject classifications: computers; databases/artificial intelligence; data mining; decision trees; pruning; public sector; society; privacy; probability; entropy; relative entropy.
History: Received August 2007; revision received June 2008; accepted November 2008.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 2009 by INFORMS.