Examining and Detecting Bias in a Credit Card Defaults Dataset
This section explores bias in a credit card defaults dataset, looking at age-based discrimination leading to higher default rates among young and old borrowers.
Join the DZone community and get the full member experience.Join For Free
There are many sources of bias in machine learning. Those rooted in the truths that the data represents, such as systemic and structural ones, lead to prejudice bias in the data. There are also biases rooted in the data, such as sample, exclusion, association, and measurement biases. Lastly, there are biases in the insights we derive from data or models we have to be careful with, such as conservatism bias, salience bias, and fundamental attribution error.
This section is an excerpt from my recent book, Interpretable Machine Learning with Python, Second Edition. You will find the code for this section here.
For this example, to properly disentangle so many bias levels, we ought to connect our data to census data (The comparisons of data mining techniques for the predictive accuracy of the probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480) for Taiwan in 2005 and historical lending data split by demographics. Then, using these external datasets, control for credit card contract conditions, as well as gender, income, and other demographic data, to ascertain if young people, in particular, were targeted for high-interest credit cards they shouldn't have qualified for. We would also need to trace the dataset to the authors and consult with them and the domain experts to examine the dataset for bias-related data quality issues. Ideally, these steps would be necessary to validate the hypothesis, but that would be a monumental task requiring several chapters' worth of explanation.
Therefore, in the spirit of expediency, we take the premise of this chapter at face value. That is, due to predatory lending practices, certain age groups are more vulnerable to credit card default, not through any fault of their own. We will also take the quality of the dataset at face value. With these caveats in place, it means that if we find disparities between age groups in the data or any model derived from this data, it can be attributed solely to predatory lending practices.
There are also two types of fairness, outlined here:
- Procedural fairness: This is about fair or equal treatment. It's hard to define this term legally because it depends so much on the context.
- Outcome fairness: This is solely about measuring fair outcomes.
These two concepts aren't mutually exclusive since the procedure may be fair but the outcome unfair, or vice versa. In this example, the unfair procedure was the offering of high-interest credit cards to unqualified customers. Nevertheless, we are going to focus on outcome fairness in this chapter.
When we discuss bias in machine learning, it will impact protected groups, and within these groups, there will be privileged and underprivileged groups. The latter is a group that is adversely impacted by bias. There are also many ways in which bias is manifested and thus addressed, as follows:
- Representation: There can be a lack of representation or an overrepresentation of the underprivileged group. The model will learn either too little or too much about this group compared to others.
- Distribution: Differences in the distribution of features between groups can lead the model to make biased associations that can impact model outcomes either directly or indirectly.
- Probability: For classification problems, class balance discrepancies between groups can lead to the model learning that one group has a higher probability of being part of one class or another. These can be easily observed through confusion matrices or by comparing their classification metrics, such as false positive or false negative rates.
- Hybrid: A combination of any of the preceding manifestations.
Strategies for any bias manifestation are discussed in chapter 11 of the book, but the kind we address in the section pertains to disparities with probability for our main protected attribute (
_AGE). We will observe disparities in the data for the protected feature through visualizations.
Without further ado, let's move on to the practical portion of this section.
Visualizing Dataset Bias
The data itself tells the story of how probable it is that one group belongs to a positive class versus another. If it's a categorical feature, these probabilities can be obtained by dividing the
value_counts() function for the positive class over all classes. For instance, for gender, we could do this:
The preceding snippet produces the following output, which shows that males have, on average, a higher probability of defaulting on their credit card:
The code for doing this for a continuous feature is a bit more complicated. It is recommended that you use
pandas' qcut to divide the feature into quartiles first and then use the same approach used for categorical features. Fortunately, the
plot_prob_progression function does this for you and plots the progression of probabilities for each quartile. The first attribute is a pandas series, an array or list with the protected feature (
_AGE), and the second is the same but for the target feature (
IS_DEFAULT). We then choose the number of intervals (
x_intervals) that we are setting as quartiles (
use_quartiles=True). The rest of the attributes are aesthetic, such as the label, title, and adding a
mean_line. The code can be seen in the following snippet:
title='Probability of Default by Age'
The preceding code produced the following output, which depicts how the youngest (
21-25) and oldest (
47-79) are most likely to default. All other groups represent just over one standard deviation from the mean:
Figure 1: Probability of CC default by _AGE
We can call the youngest and oldest quartiles the underprivileged group and all others the privileged group. In order to detect and mitigate unfairness, it is best to code them as a binary feature—and we have done just that with
AGE_GROUP. We can leverage
plot_prob_progression again, but this time with
AGE_GROUP instead of
AGE, and we will
replace the numbers with labels we can interpret more easily. The code can be seen in the following snippet:
title='Probability of Default by Age Group',
The preceding snippet produced the following output, in which the disparities between both groups are pretty evident:
Figure 2: Probability of CC default by
Next, let's bring
GENDER back into the picture. We can employ
plot_prob_contour_map, which is like
plot_prob_progression but in two dimensions, color-coding the probabilities instead of drawing a line. So, the first two attributes are the features we want on the x-axis (
GENDER) and y-axis (
AGE_GROUP), and the third is the target (
IS_DEFAULT). Since both our features are binary, it is best to use
plot_type='grid' as opposed to contour. The code can be seen in the following snippet:
title='Probability of Default by Gender/Age Group'
The preceding snippet generates the following output. It is immediately evident how the most privileged group is 26- 47-year-old females, followed by their male counterparts at about 3-4% apart. The same happens with the underprivileged age group:
Figure 3: Probability grid of CC default by
The gender difference is an interesting observation, and we could present a number of hypotheses as to why females default less. Are they just simply better at managing debt? Does it have to do with their marital status or education? We won't dig deeper into these questions. Given that we only know of age-based discrimination, we will only use
AGE_GROUP in privileged groups but keep
GENDER a protected attribute.
In this analysis, we uncovered clear evidence of bias against younger and older credit card holders, who were more likely to default on their cards. This was likely due to predatory lending practices that targeted these groups with high-interest credit they should not have qualified for. After visualizing the bias using probability plots and fairness metrics, mitigation strategies could be employed to create a more equitable machine learning model. However, more work is still needed to address the root causes and prevent such biased datasets from being created in the first place. Explore this topic further in the book Interpretable Machine Learning with Python, Second Edition.
Opinions expressed by DZone contributors are their own.