This article explores two additional aspects that often need to be addressed during the scorecard development process: segmentation and reject inference (RI).
How many scorecards? What are the criteria? What is the best practice? These are the common questions we try to answer early in the scorecard development, starting with the process of identifying and justifying the number of scorecards — a process known as segmentation.
Figure 1: Scorecard segmentation
The initial segmentation pre-assessment is carried out during the business insights analysis. At this stage, the business should be informed about any identified heterogeneous population segments that might have different characteristics impossible to treat as a single group to enable an early business decision about accepting multiple scorecards.
The business drivers for segmentation are:
Marketing, such as product offerings or new markets.
Different treatments across different groups of customers, for example, based on demographics.
Data availability, meaning that different data might be available through different marketing channels or some groups of customers might not have an available credit history.
The statistical drivers for segmentation assume that there are a sufficient number of observations in each segment, including "good" and "bad" accounts, and each segment contains interaction effects where predictive patterns vary across the segments.
Typically, the segmentation process includes the following steps:
- Identify a simple segmentation schema using supervised or unsupervised segmentation.
- For supervised segmentation, a decision tree is often used to identify the potential segments and capture interaction effects. Alternatively, residuals from an ensemble model can be used to detect interactions in the data.
- Unsupervised segmentation, such as clustering, can be used to create the segments, but this method does not necessarily capture the interaction effects.
- Identify a set of candidate predictors for each of the segments.
- Build a separate model per segment.
- If the segmented models have different predictive patterns. Failure to identify new predictive characteristics across segments indicates that the data scientist should search for a better segmentation split or build a single model.
- If the segmented models have similar predictive patterns but with significantly different magnitudes or opposing effects across the segments.
- If the segmented models produce superior lift in predictive power, comparing to a single model built on the entire population.
Segmentation is an iterative process that requires constant judgment to determine whether to use single or multiple segments. From the practitioners' experience, segmentation rarely results in a significant lift and every effort should be made to produce a single scorecard. The common methods used to avoid segmentation include adding additional variables in the logistic regression to capture interaction effects or identifying the most predictive variables per segment and combining them into a single model.
Separate scorecards are usually built independently. However, if the reliability of model factors is an issue, a parent/child model may offer an alternative approach. In this approach, we develop a parent model on the common characteristics and use the model output as a predictor into its children models to supplement unique characteristics across children segments.
The primary aim of multiple scorecards is to improve the quality of risk assessment when compared to a single scorecard. Segmented scorecards should only be used if they offer significant value to the business that outweighs the higher development and implementation cost, the complexity of the decision management process, additional management of scorecards, and greater use of IT resources.
Application scorecards have naturally occurring selection bias if the modeling is based solely on the accepted population with known performance. However, there is a significant group of rejected customers excluded from the modeling process because of their unknown performances. In order to address the selection bias, application scorecard models should include both populations. This means that unknown performance of the rejects needs to be inferred, which is completed using the reject inference (RI) method.
Figure 2: Accepts and rejects populations
With or without reject inference? There are two schools of thought: those who think that RI is a vicious circle, where the inferred performance of the rejects would be based on the approved but biased population, which consequently leads to less reliable reject inference; and those who advocate RI methodology as a valuable approach that benefits the model's performance.
There are a few extra steps required during the scorecard development if using RI:
- Build a logistic regression model on the accepts (this is the
- Infer the rejects using a reject inference technique.
- Combine the accepts and the inferred rejects into a single dataset (
- Build a new logistic regression model on complete_population (this is the
- Validate the
- Create a scorecard model based on the
Figure 3: Scorecard development using reject inference
Reject inference is a form of missing values treatment where the outcomes are "missing not at random" (MNAR), resulting in significant differences between accepted and rejected populations. There are two broad approaches used to infer the missing performance: assignment and augmentation, each having a different set of techniques. The most popular techniques within the two approaches are proportional assignment, simple and fuzzy augmentation, and parceling.
Table 1: Reject inference techniques
Figure 4: Proportional assignment
Figure 5: Simple augmentation
Figure 6: Fuzzy augmentation
Figure 7: Parceling
To be continued...