Credit Risk | Insights, Resources & Best Practices

Model Validation: Breaking Down the Glitch in a Key Tool

Written by Marco Folpmers | Nov 13, 2020 5:00:00 AM

When validating models, to separate riskier and less risky customers, banks need to assess the discriminatory power of their credit risk models to rank obligors. Kendall's tau is a statistical test that banks use to help them rank the probability of default (PD) of borrowers, but there is a glitch in the test that leaves it susceptible to manipulation.

What PD model properties does Kendall's tau evaluate? How is it used, in practice, today, and why is it imperfect?

Marco Folpmers

Banks use discriminatory power assessments (along with calibration and stability tests) to validate PD rating models. The discriminatory power test is usually carried out by assessing the success of predicting the default/non-default property in a historical dataset.

Kendal's tau is one of the “discriminatory power” tools available to banks, and can be used in a variety of ways. For example, when banks use innovative methods for new rating systems (e.g., for predictive modeling or to enhance the modeling dataset with transactional data), Kendall's tau can be applied. Specifically, for credit risk modeling, it can be used when a bank is leveraging two competing internal rating systems - e.g., an internal model in production versus an internal challenger model.

Kendal's tau, moreover, can also come in handy in cases where there's an external rating available (e.g., from an external credit assessment institution, or “ECAI”) and where, for each obligor, the external rating is compared with the rating provided by the internal rating system. In such a case, an objective (or, at least, outside) standard exists and the bank assesses whether its own ranking corresponds with this objective ranking.

As a back-test, Kendall's tau can compare the ranks of the one-year observed default rates (for the latest year) per PD bucket with the ranks of the bucket PDs. Ideally, the observed default rates will be monotonically increasing across PD buckets, yielding 100% rank-correlation.

Before we explore the glitch in Kendall's tau, let's discuss what it entails and how it works in practice.

Kendall's tau: Concordant and Discordant Pairs

Kendall's tau explores all possible pairs of PD rating buckets, for both internal and external rating systems. If there are, say, 20 rating buckets, one has possible combinations (or pairs).

You could, for example, rate bucket 3 versus bucket 5. If, for both rating systems, rating bucket 5 has a higher PD than rating bucket 3, the pair is said to be “concordant.” This is also the case in the (less probable) instance in which, for both rating systems, rating bucket 5 has a lower PD than rating bucket 3.

In contrast, if one rating system has a higher PD for rating bucket 3 than for rating bucket 5, and the other rating system has an opposite ranking for these buckets, the pair is said to be “discordant.”

Generally, for a rating system with buckets, the number of pairs is . If we denote the number of concordant pairs as and the number of discordant pairs as , then Kendall's tau equals . Since by definition (in this contribution, we abstract from the treatment of ties), it is easily seen that Kendall's tau scales to a measure between -1 and +1.

In summary, Kendall's tau is the level of concordance between pairs of PD rating buckets, scaled to a value between -1 and +1. For example, if an internally-developed PD rating model produces predictions that matches an external rating very closely, Kendall's tau will be close to +1.

Kendall's tau in Practice

Below we illustrate a typical (performing) PD scale, ranging from three basis points (bps) to 22%, with PD values increasing exponentially across 20 buckets.

Suppose that the primary rating system (the incumbent model, or the ratings provided by an external agency) are shown with the help of the blue bars.

Figure 1: ECAI Ratings vs. Internal Ratings

Our second (challenger) rating system is shown with the help of the red dots. As depicted in Figure 2, the blue bars are followed by the red dots - more or less.

If we want to prove a high level of rank‐correlation of the challenger model compared with the incumbent one, Kendall's tau is the appropriate measure.

We have to stress here that Kendall's tau is a rank‐correlation test for discriminatory power ‐ i.e., the question is whether model 2 is capable of ranking the obligors in the same way as model 1. This means that the absolute values of the red dots are not important - we are only concerned with their ranks, vis‐À‐vis the ranks of the blue bars.

(This test should not be confused with a calibration test. For calibration, one could apply a Jeffreys test ‐ or a binomial test or Hosmer‐Lemeshow test - to assess the distance between the red dots and the top of the blue bars.)

Since in this study we are interested whether model 2 ranks the obligors in the same way as model 1, we calculate Kendall's tau based on the ranks depicted by the blue bars and the ranks of the red dots.

In our example, Kendall's rank-order correlation coefficient equals 87%.

A Dangerous Glitch

Suppose now that we condense the rating systems in such a way that the number of rating buckets is 10 instead of 20. The rating system is condensed by combining rating buckets 1 and 2; 3 and 4; 5 and 6, etc.

For simplicity's sake, we assume that the values of the red dots are based on the same number of observations for each pair of rating buckets that we combine. So, we take the simple average to arrive at the red dots in the condensed situation depicted in Figure 2 below.

Figure 2: ECAI Ratings vs. Internal Ratings ‐ Condensed

 

In the condensed case, as can be seen in the monotonically-increasing red dots in Figure 2, Kendall's tau is 100%.

In short, Kendall's tau is not only dependent on the strength of the rank-correlation (or the level of concordance), but also onthe number of buckets. This means that one could manipulate Kendall's tau outcomes by condensing the rating system.

Parting Thoughts

Risk professionals need to be aware of this glitch in Kendall's tau. If a modeler wants to demonstrate sufficient discriminatory power by linking the outcomes of a PD model to externally-provided ratings, this is easier done for a PD system with a small number of buckets than one with a large number of buckets, ceteris paribus.

Banks should be aware of the test's PD bucketing quirks and realize that internal thresholds for passing Kendall's tau can be reached more easily by arbitrarily condensing the rating system.

 

Dr. Marco Folpmers (FRM) is a partner for Financial Risk Management at Deloitte Netherlands. He is also a professor of financial risk management at Tilburg University/TIAS. The author wishes to thank Vittorio Maio (M.Sc.), who kindly reviewed a previous draft of this article.