The real benefit of a scorecard or a credit strategy is only evident in implementation. The final stage of the CRISP-DM framework — implementation — represents transitioning from the data science domain to the information technology domain. Consequently, the roles of responsibilities also change from data scientists and business analysts to system and database administrators and testers.
Prior to scorecard implement, a number of technology decisions must be made. These decisions cover:
The choice of which hardware and software is used
Who has responsibility for scorecard implementation
Who is responsible for scorecard maintenance
Whether production is in-house or outsourced
Scorecard implementation is a sequential process that is initiated once the scorecard model has been signed off by the business. The process starts with the generation of a scorecard deployment code, leading to pre-production, production, and post-production.
Figure 1: Scorecard implementation stages
Deployment code is created by translating a conceptual model, such as a model equation or a tabular form of a scorecard, into an equivalent software artifact ready to run on a server. The implementation platform on which the model will run identifies the deployment language and could be, for example, the SAS language (Figure 2), SQL, PMML, or C++. Writing model deployment code can be error-prone, and often represents a bottleneck as a number of code-refinement cycles are necessary to produce the deployment code. Some analytical vendors offer automatic code deployment capability in their software — a desirable feature that produces error-free code, shortening the deployment time and the code testing cycle.
Figure 2: Automatic generation of SAS language deployment code with World Programming software
Scorecard implementation, whether on a pre-production server for testing or a production server for real-time scoring, requires an API wrapper that is placed around the model deployment code to enable the handling of remote requests for model scoring. Model inputs, provided from internal and external data sources, can be extracted either outside or inside the scoring engine. The former runs variable extraction outside the scoring engine and passes the variables as parameters of an API request. The latter, as depicted in Figure 3, runs a pre-processing code inside the scoring engine and carries out variable extraction and model scoring on the same engine.
Figure 3: Real-time scoring using API call
Pre-Production and Production
Pre-production is an environment used to run a range of tests before committing the model to the (live) production environment. These tests would typically be model evaluation and validity tests, system tests that measure the request and response time under anticipated peak load, or installation and system configuration tests.
Thoroughly tested and approved models are uploaded to the production environment: the final destination. Models running on a production server can be in an active state or a passive state. Active models are champion models whose scores are utilized in the decision-making process in real-time as either credit approval or rejection. Passive models are typically model challenges not yet utilized in the decision-making process but whose scores are recorded and analyzed over a period to justify their business value prior to becoming active models.
Every model degrades over time as the result of natural model evolution influenced by many factors including new product launches, marketing incentives, or economic drift. Hence, regular model monitoring is imperative to prevent any negative effect on the business.
Model monitoring is post-implementation testing used to determine if models continue to be in line with expected performance. IT infrastructure needs to be set up in advance to enable monitoring by facilitating generation of model reports, a repository for storing reports, and a monitoring dashboard.
Figure 4: Model monitoring process
Model reports can be used to, for example, identify if the characteristics of new applicants change over time; establish if the score cut-off value needs to be changed to adjust acceptance rate or default rate; or determine if the scorecard ranks the customer in the same way as it ranked the modelling population across different risk bands.
Scorecard degradation is typically captured using pre-defined threshold values. Depending on the magnitude of change, a relevant course of action is taken. For example, minor changes in scorecard performance metrics can be ignored, but moderate changes might require more frequent monitoring or scorecard recalibration. Any major change requires rebuilding the model or swapping to the best performing alternate model.
Credit risk departments have access to an extensive array of reports, including a variety of drift reports, performance reports and portfolio analysis (Table 1). Examples of the two most typical reports are population stability and performance tracking. Population stability measures the change in the distribution of credit scores in the population over time. The stability report generates an index that indicates the magnitude of change in customer behavior as result of changes in population. Any significant shift would set up an alert requesting the model redesign. A performance tracking report is a back-end report that requires a sufficient time for customer accounts to mature so the customer performance can be assessed. Its purpose is two-fold: firstly, it tests the power of the scorecard by assessing if the scorecard is still able to rank the customers by risk, and secondly, it tests the accuracy by comparing the expected default rates known at the time of modeling with current default rates.
Table 1: Scorecard monitoring reports
The challenge with model monitoring is a prolonged time lag between the change request and its implementation. The complexity of tasks to facilitate monitoring process for every model running in the production environment (Figure 1), including code to generate reports, access to the relevant data sources, model management, report schedulers, model degradation alerts, and visualization of the reports, lead to a demanding and challenging process. This has been the main motivation for lenders to either outsource model monitoring capability or invest in an automated process that facilitates model monitoring with minimal human efforts.