Risk Weighted

The Data Conundrum for Risk Modelers

Forecasts, scenario projections and other types of model output are not data. They are subject to error and potential bias, and risk managers would do well to keep this in mind when they are validating models.

Thursday, March 28, 2024

By Tony Hughes


Credit risk modelers today face a myriad of challenges, ranging from scenario analysis to forecasting to validation. But when modelers use a loose definition of the term “data,” bad habits can form quickly, creating problems they must understand and protect against.

If you research the etymology of "data," you will find that its original meaning is closely tied to the concept of a fact, something that is known or generally assumed to be true.

tony-hughesTony Hughes

In modern times, however, usage of the word has expanded to include pretty much anything that can be stored in a spreadsheet. Forecasts and scenario projections are often referred to as "data," and I've even seen output from Monte Carlo simulations described in the same way. I suspect this is because computer scientists define the term very broadly to include anything that can be stored on a server.

While experienced risk modelers generally understand the difference between data and "data," we must remember that our clients – including bankers and other business professionals – may not.

Communication of many aspects of model risk would be significantly improved if we were to adopt a more disciplined approach and strictly revert to the original meaning of the four-letter "d-word." Let me give you an example of what I mean.

Suppose there's a set of key inputs you rely on to calculate loss forecasts, and my company sells a range of projections for these variables. When I send over the spreadsheet, it's important that this be treated in the same way as any other model output and be subjected to the usual validation processes.

The numbers are only factual in the sense that it is true that my company processed some data and produced the forecasts. Rather than referring to them as “data,” the numbers should really be described as "model output" – or something similar.

In situations where the vendor spreadsheet is used to replace or challenge internal efforts, this won't cause much of a kerfuffle. But when the underlying data is more specialized, it is common for risk managers to confuse model output with proper, unprocessed data.

Imperfect Valuations, and the Importance of Proper Validation

About a year ago, I was working to validate a stress test related to loans for aviation equipment. Through the pandemic, the airline industry experienced heavy turbulence (pun intended) and airplane valuations suffered some extreme swings.

The bank in question was sourcing nowcasts of secondary market prices for planes with various configurations and condition gradings. Such projections are needed for loss given default (LGD) calculations and other purposes because the aviation market is lumpy and often illiquid, meaning that recent sales of similar aircraft often cannot be sourced.

The key point is that the bank was treating the valuations as if they were infallible. The bankers and risk managers all referred to the numbers in the spreadsheet as "data," and risk management systems made no allowance for possible error.

The underlying models were never subjected to proper validation processes. In reality, the output was going a little haywire in the aftermath of the pandemic – a familiar story – and the bank was consequently underestimating the amount of risk in its aviation portfolio.

This is just one example of many. Vendors often seek to add value to generic data offerings by augmenting them with a range of analytical bells and whistles designed to make their products immediately actionable by business users and to differentiate their offerings from their competitors.

These features are useful, because they provide unique insights for end users and take advantage of the fact that the vendor's analysts generally understand the raw data better than outsiders. But this doesn't change the fact that the add-ons are model-based and thus subject to error and potential bias.

Forward-Looking Fallacy

Another common trope, in widespread use across the industry, is to package forecasts and scenario projections as "forward-looking data," which is a concept I have skewered in the past. The use of the d-word in this context is frankly quite dishonest, trying to give the impression of precision while hiding the fact that there are models and subjective gut feelings in the background attempting to divine the future.

"Forward-looking data" is great marketing; it sounds really wonderful, but, at the end of the day, true data can only be historical. Data may have characteristics that make it useful as a leading indicator, but this is not what people normally mean when they describe their product as “forward looking.”

While my focus has been on vendor products, internal model producers are not blameless, either. Often, they fail to correct their colleagues when they refer to the output from statistical modeling processes as "data," while allowing them to treat the information in the spreadsheet as if it was precisely measured.

Parting Thoughts

When we learn statistics, we are drilled with the idea that more data is always beneficial. Tests become more powerful and confidence intervals become narrower. Predictions, moreover, are often made more accurate.

But these principles don’t always hold when the information fed to the models has been heavily modified. Data, like vegetables, often lose their nutritional value when subjected to modern industrial processes.

But just like with vegetables, the processing of data often makes it cheaper and easier to digest. When we use statistics, we don’t fully understand we must resign ourselves to the reality that we are missing some of the nutritional value we could access if we processed the data in our own kitchens.

We shouldn’t pretend that we are living holistically when we are not. The term “data” should be reserved for actual measurements – and model output should be labeled appropriately.

It is up to experienced risk modelers to be more disciplined in their use of these terms, so that they are not misunderstood.


Tony Hughes is an expert risk modeler. He has more than 20 years of experience as a senior risk professional in North America, Europe and Australia, specializing in model risk management, model build/validation and quantitative climate risk solutions. He writes regularly on climate-related risk management issues at


BylawsCode of ConductPrivacy NoticeTerms of Use © 2024 Global Association of Risk Professionals