Credit Edge

Credit-Risk Models Based on Machine Learning: A 'Middle-of-the-Road' Solution

Even though machine-learning technology has been around for some time now, financial institutions' appetite for complex, ML-driven credit risk models remains limited. However, explainable hybrid models that use a combination of ML-engineered features and traditional logistic regression are growing in popularity.

Friday, September 10, 2021

By Marco Folpmers and Linda Torn


Financial institutions are using machine-learning-driven models for everything from anti-money laundering to fraud protection. However, while ML models have undoubtedly yielded gains in these areas, there are still concerns about their explainability, bias and interpretability - particularly when compared with more traditional approaches. These are among the reasons for the slow adoption of these models, to date, within credit risk.

The low explainability of ML-driven models for credit risk remains, perhaps, their greatest drawback. A visual inspection of, say, a random forest is impossible, and although there are some tools (like feature importance) that provide information about the inner workings of this type of model, ML model logic is significantly more complicated than that of a traditional logistic regression approach.

Linda Torn

However, we're increasingly seeing “middle-of-the-road” solutions that incorporate ML-engineered features within an easier-to-explain logistic regression model. Under this approach, ML is used to select highly-predictive features (for, say, probability of default), which are then integrated with the so-called “logit” model. This hybrid model would include both original and ML-engineered features, and an automated algorithm would select the features for forecasting PD.

Performance-driven features can be added to this model through Sequential Forward Selection (SFS), one of the most widely-used algorithms for feature selection. To prevent an unnecessary abundance of features from being selected, and to pick only those that perform optimally, a penalty for the number of features chosen can be implemented.

Feature Engineering

Let's take one step back. Before the SFS can be selected to work on the expanded-feature set, the new features themselves need to be generated. This step is generally referred to as “feature engineering,” and ML can also play a role here. One can think of these ML-engineered features as combinations of the existing features (e.g., Loan-to-Value combined with geographic area for a mortgage loans portfolio) and highly predictive dummies signaling whether a risk driver is within an especially relevant domain (e.g., “Loan-to-Value > 0.80”).

Feature engineering is both an art and a science. It can be explored with the help of the domain knowledge of experts, and one of its advantages is that it offers model developers the opportunity to perform a focused search of potentially powerful new features. However, feature engineering based on trial-and-error can be very labor intensive, and the bias of experts can potentially lead to suboptimal solutions.

A potential remedy to these issues is to use the classification-and-regression-trees (CART) algorithm for ML. CART can help define powerful features that are subsequently blended as an extended feature set with a logit model, and there is some evidence that the resulting model may outperform both CART itself and the logit benchmark model (see Brezigar-Masten and Masten, 2012).

There are, however, settings in which a feature set that is enhanced with CART-based dummies does not necessarily outperform an original-drivers-driven logit model. Rather, it attains only the same performance, while using fewer drivers.

CART-Based Dummies, and Super Parameters

The CART model can be represented as a decision tree in which the independent variables are split at certain cut-off points (e.g., “Loan-to-Value > 0.80”). The leaf nodes in the CART model represent the decision - e.g., the “yes” or “no” default prediction - for PD models.

Using lower-level nodes, multiple relevant LTV ranges (e.g., “0.60 < Loan-to-Value ≤ 0.80”) can be detected - as long as these effectively contribute to CART-model performance. While working backwards from the end nodes, complex risk drivers can be constructed - e.g., “obligor lives in rural area” AND “0.60 < Loan-to-Value ≤ 0.80.”

For the middle-of-the-road ML solution, these engineered features can then be entered into the logit model - supplementing the model's original set of features. In practice, this can easily inflate the number of variables to be used in an SFS-driven logit model by a factor of 30 - say, from 50 features to 1,500 features.

Under a hybrid model that incorporates ML, instead of using a subset of the original features of a traditional logit model, one can also employ just a few ML-engineered “super parameters” that enable high performance. Super parameters in the more explainable hybrid model can be explored in terms of their constituent components - such as “Obligor lives in rural area” AND “0.60 < Loan-to-Value ≤ 0.80.”

Super Parameters: Advantages and Challenges





ML is used for the automated generation of super parameters.

Are these super parameters stable themselves?

Expanding the feature space:

ML is potentially able to identify highly-predictive combinations of parameters.

This a brute-force method. If the potential variable space is significantly expanded (by, say, a factor of 30), SFS procedures for building the logit model can become unstable.


With a super parameter, one reaches a high AUC using fewer variables.

Is it fair to count a highly-synthetic parameter as only one?


Brute-force engineering algorithms identify relationships from the past.

Experts can use other approaches to better identify new potential determinants (e.g., COVID-19).


An Imperfect Approach

Overall, hybrid, ML-fueled credit risk models are more parsimonious than traditional models and more explainable than AI-based approaches. But are these “middle-of-the-road” solutions the ideal option? The answer is no.

Marco Folpmers

SFS is known to be unstable when it has to work through a large feature space. One needs cross-validation (fitting to multiple sub-samples) to make sure that a stable solution has been reached. Moreover, as mentioned previously, routine SFS implementations sometimes have difficulty processing very large datasets.

Another concern is the stability of engineered features. The more complex the feature is, the more it is at risk of overengineering. This concern can be addressed, of course, by cross-validating the CART procedure.

However, in these uncertain times, the effectiveness of a highly predictive, synthetic “super parameter” is unclear. Models that were constructed before the pandemic may be driven by feature engineering procedures that rely on pre-2020 data, which obviously does not include pandemic drivers of risk. Experts can more easily identify such risk factors than fancy models with data-deficient super parameters.

One can also wonder whether highly-synthetic variables (using more than just the two dimensions illustrated in our “obligor lives in rural area” AND “0.60 < Loan-to-Value ≤ 0.80” example) offer any real advantages over a logic model. Though the middle-of-the-road approach is more frugal (at least, in some applications), it achieves basically the same area under the curve (AUC) as the traditional approach.

If a highly-synthetic variable is counted as only one parameter, it's also fair to ponder whether penalties on performance for the number of variables used are correctly determined. Certainly, this super parameter carries more model risk than the original features.

Parting Thoughts

Applying ML to credit risk models is potentially promising. It allows for more advanced feature engineering than applied in traditional models, while also maintaining a logit-model setting - preserving (as much as possible) the explainability factor.

However, this is not the whole story. While ML-engineered super parameters might be interesting and powerful tools, there are concerns about their stability and lack of sophistication.

At best, these solutions are an intermediate step toward a more responsible application of ML to future models. For this reason, one can best describe them as a “middle-of-the-road” solution.

Dr. Marco Folpmers (FRM) is a partner for Financial Risk Management at Deloitte Netherlands.

Linda Torn works in the analytics team at Deloitte Netherlands. She holds an MSc. in Econometrics and Mathematical Economics, and wrote her master's thesis on the added value of machine learning for feature selection in PD models.


BylawsCode of ConductPrivacy NoticeTerms of Use © 2024 Global Association of Risk Professionals