Our current public policy posts, focused on ethics and bias in current and emerging areas of AI, build on the work “A Survey on Bias and Fairness in Machine Learning” by Ninareh Mehrabi, et al. and resources provided by Barocas, et al. The guest co-author of this series of blog posts on AI and bias is Farhana Faruqe, doctoral student in the George Washington University Human-Technology Collaboration program. We look forward to your comments and suggestions.
Discrimination, unfairness, and bias are terms used frequently these days in the context of AI and data science applications that make decisions in the everyday lives of individuals and groups. Machine learning applications depend on datasets that are usually a reflection of our real world in which individuals have intentional and unintentional biases that may cause discrimination and unfair actions. Broadly, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their intrinsic or acquired traits in the context of decision-making.
Today’s blog post focuses on discrimination, which Ninareh Mehrabi, et al. describe as follows:
Direct Discrimination: “Direct discrimination happens when protected attributes of individuals explicitly result in non-favorable outcomes toward them.” Some traits like race, color, national origin, religion, sex, family status, disability, marital status, recipient of public assistance, and age are identified as sensitive attributes or protected attributes in the machine learning world. It is not legal to discriminate against these sensitive attributes, which are listed by the FHA and Equal Credit Opportunity Act (ECOA).
Indirect Discrimination: Even if sensitive or protected attributes are not used against an individual, still indirect discrimination can happen. For example, residential zip code is not categorized as a protected attribute, but from the zip code one may find out about race which is a protected attribute. So, “protected groups or individuals still can get treated unjustly as a result of implicit effects from their protected attributes.”
Discrimination. In the nursing profession, the custom is
to expect a nurse to be a woman. So, excluding qualified male nurses for
nursing position is an example of systematic discrimination. Systematic
discrimination is defined as “policies, customs, or behaviors that are a part
of the culture or structure of an organization that may perpetuate
discrimination against certain subgroups of the population”.
Statistical Discrimination: In law enforcement, racial profiling is an example of statistical discrimination. In this case, minority drivers are pulled over more often than white drivers. The authors define “statistical discrimination is a phenomenon where decision-makers use average group statistics to judge an individual belonging to that group.”
Explainable Discrimination: In some cases, “discrimination can be explained using attributes” like working hours and education, which is legal and acceptable as well. In a widely used dataset in the fairness domain, males on average have a higher annual income than females because on average females work fewer hours per week than males do. Decisions made without considering working hours could lead to discrimination.
Unexplainable Discrimination: This type of discrimination is not legal as explainable discrimination because “the discrimination toward a group is unjustified”. Some researchers have introduced techniques during data preprocessing and training to remove unexplainable discrimination.
To understand bias in techniques such as machine learning, we will discuss in our next blog post another important aspect: fairness.