Model Validation: Dissecting the Boundaries of a Rules-Based World

Banks’ internal ratings-based (IRB) processes for validating credit risk models have historically been subject to rules-based supervision established by the European Banking Authority (EBA). However, through a recent consultation on its handbook for model validation, the EBA is now seeking feedback from regional financial authorities and from banks on its requirements.

Will this ultimately lead to a loosening of the EBA’s credit risk model validation rules? The only way to evaluate whether we’ll see any real changes (after the EBA receives comments from market participants) is to take a closer look at both the handbook and the questions the supervisor is asking in the consultation paper.

Marco Folpmers

Superficially, the EBA’s handbook aims at harmonizing supervisory practices across the EU. Evaluating validation practices at the banks within their remit is, after all, one of the key responsibilities of the “competent authorities” the handbook seeks to address.

For banks, the handbook not only provides supervisory guidance on model validation but also offers a helpful summary of existing regulation – especially the Capital Requirements Regulation, the Commission Delegated Regulation, and lower-level EBA standards.

What are the key takeaways from the handbook? It’s a comprehensive document, but there are five important guidelines that should capture the attention of both competent authorities and the banks under their supervision:

IRB validation is more than “just a model validation.” Authorities are asked to review whether banks have “a set of policies, processes and procedures” in place to check models’ input data quality, accuracy and performance.
IRB validation should reside within the second layer of defense and needs to be independent, vis-à-vis the first-line credit risk control unit (CRCU). Sufficient resources need to be allocated to validation to meet this condition.
“Rules of the game” are specified for both initial (“first”) and subsequent validations. Since they focus more on performance and less on, e.g., model design, subsequent validations are typically less intense than initial validations.
Competent authorities are supposed to check the structure of banks’ validation reports. This type of report needs to contain, at a minimum, a list of all deficiencies (including an assessment of materiality and severity); an assessment of the overall performance of the model; and an evaluation of the level of confidence one can have in the results of the statistical tests – particularly if data is lacking.
Validation tests should center around risk differentiation (i.e., discriminatory power) and risk quantification (i.e., the accuracy of the estimates, without bias). Statistical results must be assessed both against absolute thresholds (e.g., a threshold for the area-under-the-curve) and against historical performance.

The First Level: Theoretical vs. Practical

While these guidelines certainly make some sense, they are arguably too theoretical, and therefore miss certain modeling nuances. The practices described in the handbook indeed run the risk of prescribing a theoretical approach - but, today, risk-based approaches are seen as preferable, because they make the most out of scarce validation resources.

Gerrit Reher

For example, the handbook requires a structural set-up of validation (initial vs. subsequent) activities according to validation type. However, it does not require model tiering - even though, in practice, this an equally important structural device for prioritizing the deployment of scarce validation resources.

The same impracticality can be seen in the rule for risk differentiation and risk qualification (see #5, above). The handbook invites competent authorities to confirm whether statistical results are not only checked against “backstop” absolute thresholds but also via “ad-hoc comparative analysis.” For subsequent validations, it states that “the comparison between the latest results of the validation and the ones observed in the previous years (…) can be used to detect a trend (i.e., a deterioration) in the model performance.”

However, this is, again, a highly theoretical view on validation. In practice, it is not always straightforward to compare the “latest” validation results with data from “previous years.” Depending on circumstances, the discriminatory power of validation may even be worse in “previous years” in comparison with recent data.

The fact that input information is often subject to remediation is one of the reasons behind this inadequacy. Remediation procedures remove data deficiencies and enhance a model’s signal-to-noise ratio, giving data from later years a validation performance edge over older (model development) samples that lack remediation.

The Second Level: Highlighted Items

The handbook contains “focus boxes” and “interaction boxes” that are intended to attract the attention of the targeted competent authorities. These notations are expected to aid a supervisory dialogue between a bank and its competent authority, so they deserve a closer look.

Special attention is given to the required structural independence of the validation unit. Repeating guidance from the CDR, structural independence does not necessarily have to be achieved by separation from the credit risk control unit. The validation group, in fact, can be linked to the CRCU, as long as the staff performing the validations is different from the staff responsible for the model development. (This is only allowed, however, at small and less complex banks.)

Risk data is also addressed in one of the handbook’s interaction boxes. Although the bank’s IT function is responsible for data collection, processing, transformation and storage, that notation states, the validation function is expected to have sufficient knowledge of the IT infrastructure. Moreover, the validation function must have knowledge of “all data quality checks performed with respect to IRB models, [as well as] their inputs, outputs and the calculation of the own-funds requirements.”

Whereas the data inputs are clearly within the scope of the validation report as a model input dimension, the obligation to have “sufficient knowledge” of all data-quality checks and to understand the checks related to the calculation of the own-funds requirements are more far-reaching responsibilities. Typically, those responsibilities are beyond the scope of the model validation team.

The Third Level: Supervisory Dilemmas

At the third level, there are the supervisory dilemmas of validation that continue to confound supervisors. The consultation paper therefore asks for industry guidance on a total of six questions.

The questions deal with the split of first versus subsequent validations; the validation of rating systems that are used across different entities; the review of the definition of default; the scope of the back-testing; the supervisory slotting approach; and data scarcity and output testing. Comments can be submitted until October 28 of this year.

These areas are the most interesting parts of the consultation paper, because they demarcate the very boundaries of the rules-based world. By asking questions and seeking feedback on these issues, the EBA is leaving room for judgment and dialogue.

The query on data scarcity and output testing (model development vs. validation samples) is the one we’d like to focus on here. In the handbook, out-of-sample (OOS) and out-of-time (OOT) data for model validation are clearly defined.

Ideally, as opposed to being divided between model development and validation testing, both OOS and OOT data should be reserved for validation tests. The handbook, however, interestingly takes a more nuanced view of these data types, stating that “alternative validation approaches” are possible in the context of data scarcity.

Choosing between OOT and OOS data sample for model validation (leaving the other for model development enhancement) is one such alternative approach. The desirability of this “second-best” solution is, of course, debatable.

On the one hand, to ensure that performance statistics are unbiased, one requires complete and representative validation data, ideally based on both OOS and OOT data. On the other hand, adding data to a smaller data set (either the OOT or OOS sample) can dramatically improve precision of the estimates.

Regardless of where one stands on the data scarcity/output testing issue, one thing is certain: the EBA has clearly demonstrated, with the help of the six questions, where its rules-based world ends – and where dialogues (and, potentially, validation dilemmas) begin.

Parting Thoughts

The EBA consultation paper on IRB validation provides unique insights about the areas in which the supervisor is struggling to come up with solutions to validation dilemmas. Under a rules-based approach, clear answers are necessary. However, in practice, many validation issues are more nuanced.

After the EBA issues more complex guidance, adding another layer to its rules-based approach to validation, boundaries will ultimately remain. But even if the supervisor sets up further useful credit risk modeling and validation standards for banks, there are limits to what it can define. Recognizing these limits, the EBA has asked the industry (local authorities and banks) for help with its most problematic validation issues.

The six questions for consultation are therefore the most interesting parts of the supervisory manual. The industry is being asked to think along. Will the six aforementioned issues be resolved by further rules or do these simply represent the boundaries of the rules-based world, where judgment will be expected to begin?

Dr. Marco Folpmers (FRM) and Dr. Gerrit Reher are partners for Financial Risk Management at Deloitte, for the Netherlands and Germany, respectively.

Topics: Modeling

2025 FRM Candidate Guide

2025 SCR Candidate Guide

2024-2025 RAI Candidate Guide

2024 Risk Careers Survey: Global Report

Article

Model Validation: Dissecting the Boundaries of a Rules-Based World

Share

Trending