A Study on Various Approaches For Discrimination Avoidance In Data Mining |
Datamining is an important technology for extracting useful patterns from largeamount of data. Two major prevalent issues in data mining are privacy violationand discrimination. Discrimination arises when people are given unfairtreatment on the basis of their sensitive features like gender, race, religionetc. Types of discrimination are direct and indirect discrimination. Directdiscrimination consists of rules based on sensitive attributes like religion,race, community etc. Indirect discrimination occurs when decisions are based onnon-sensitive attributes which are closely related to sensitive attributes.Automated data collection and data mining techniques such as classificationrule mining are used for making automated decisions by decision supportsystems. These systems are used for personnel selection, loan granting etc. Ifthe training data sets are biased with respect to the sensitive features,discriminatory decisions may occur. Antidiscrimination techniques includingdiscrimination discovery and prevention have been introduced in data mining.The main purpose of this survey paper is to understand the existing approachesfor discrimination prevention. Automatic data collection has become the mostwanted method in the banking sector to make automatic decisions like loangranting/denial. The discriminations in the dataset will lead to take thedecisions in the partiality manner. The discrimination can be either direct orindirect discrimination. Direct discrimination occurs when decisions are madebased on sensitive attributes. Indirect discrimination occurs when decisionsare made based on non-sensitive attributes. To overcome the partialitydecisions the proposed system produces the anti-discrimination methodologies.The anti-discrimination methodologies prevent the discriminative decisions inthe dataset. The proposed system prevents the discrimination without affectingthe data quality. Data mining is important technology forextracting useful data hidden in large collections of data. Discriminationrefers unfair or unequal treatment of people based on membership to aparticular category or a minority. Automated data collection and data miningtechniques such as classification rule mining have paved way to makingautomated decisions, like loan granting/denial, insurance premium computationetc. If training data sets are biased regards discriminatory attributes likegender, race etc. discrimination decisions may ensue. For this reason,antidiscrimination techniques including discrimination discovery and preventionhave been introduced in data mining. Discrimination can be either direct orindirect.