A Methodology for Sensitive Attribute Discrimination Prevention in Data Mining
Today, Data mining is an increasingly important technology. It is a process of extracting useful knowledge from large collections of data. There are some negative view about data mining, among which potential privacy and potential discrimination. Discrimination means is the unequal or unfairly treating people on the basis of their specific belonging group. If the data sets are divided on the basis of sensitive attributes like gender, race, religion, etc., discriminatory decisions may ensue. For this reason, antidiscrimination laws for discrimination prevention have been introduced for data mining. Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on some sensitive attributes. It consists of rules or procedures that explicitly mention minority or disadvantaged groups based on sensitive discriminatory attributes related to group membership. Indirect discrimination occurs when decisions are made based on non sensitive attributes which are strongly related with biased sensitive ones. It consists of rules or procedures that, which is not explicitly mentioning discriminatory attributes, intentionally or unintentionally, could generate decisions about discrimination.