The Complexity Cascade: Why Physical Risk Models Diverge — and What Governance Needs to Do About It

Written by David K. Kelly | February 19, 2026

Why do physical risk vendors produce divergent estimates for the same properties? The core reason is a complexity cascade in which modeling choices compound across weather, hydrology, hydraulics, damage functions and forward‑looking scenarios. Robust model governance is needed to make those choices transparent and decision useful for banks.

David K. Kelly

In October 2025, the GARP Risk Institute published A Risk Professional’s Guide to Physical Risk Assessments, a benchmarking study of 13 physical risk data vendors conducted for the U.K.'s Climate Financial Risk Forum. The study found large spreads in both hazard and loss estimates, with many properties deemed highly exposed by one vendor and relatively safe by another. This divergence is not primarily about flawed science as most vendors draw on well‑established academic models. Instead, it reflects where and how each provider makes methodological choices along a multi‑step modeling chain — and how little of that chain is visible to banks today.

Where the Complexity Cascade Starts

Physical risk assessment for banking brings together models across several domains, broadly described as:

Weather and climate hazards
Hydrology
Hydraulics
Damage and vulnerability
Forward‑looking stress scenarios

Each domain uses frameworks that have been tested — with the results published in scientific literature — from thermodynamic representations of atmospheric processes and Saint‑Venant equations that describe open‑channel flow, to empirical flood depth‑damage relationships for buildings. Consequently, the main challenge for physical risk modeling lies in the way the domains' sequential application multiplies uncertainty and creates divergent outputs.

Flood risk illustrates this cascade clearly. At the hazard stage, vendors choose between global or continental weather and climate systems products — for example, the ECMWF‑based (European Centre for Medium-Range Weather Forecasts) ones that show planetary‑scale dynamics at kilometer‑scale grids — and higher‑resolution regional or convection‑permitting models that better capture localized, intense rainfall but over smaller areas. They both have pros and cons. Global systems can under‑represent peak intensities in short‑lived convective storms, while regional systems may lack coverage for all markets in which a global bank operates.

Those weather outputs then feed hydrological models that partition rainfall between infiltration (water seeping into the ground), storage (soil retaining water), and runoff (water flowing across the top of the ground), as shown in Diagram 1. Providers may apply different infiltration schemes or runoff formulations which embody different assumptions about soil properties, underlying moisture and spatial variability. Two vendors using the same rainfall but different hydrological formulations — and parameterizations — will naturally produce different runoff volumes and timing.

Runoff estimates are then passed into hydraulic routing models which estimate how the water will flow through rivers and channels. Here, choices expand again: one‑dimensional channel models, two‑dimensional surface flow models, or coupled one- and two-dimensional models that are increasingly viewed as good practice for complex floodplains and urban areas.

Diagram 1: The hydrological cascade: how rainfall is partitioned between infiltration, storage, and runoff — a key source of model divergence

Within a given hydraulic approach, practitioners need to set parameters such as roughness coefficients, and select and preprocess topographic data. Looking at just one parameter — typical values for Manning’s roughness in “clean, straight” natural channels can differ by 50% or more between equally defensible engineering judgements, and shifts in those values alter simulated depths and velocities. Topographic detail obtained from digital terrain models derived from high‑resolution LiDAR can materially improve representation of flow pathways and defense structures, but are not available everywhere, forcing vendors to choose between coarser datasets, interpolation, or bespoke local surveys.

The hydraulic outputs — water depths, extents, velocities and durations — then drive damage models that convert hazard information into loss data. Many implementations rely on generic depth‑damage curves and regional “typical building” archetypes. Yet construction practices and adaptation measures can differ sharply between, say, Mediterranean coastal towns and Northern European riverine communities, as well as between properties built under different building codes, even in the same city. When vendors apply static, region‑averaged damage functions without explicit links to local building characteristics they can generate very different loss ratios for the same hazard. This spatial complexity is compounded by a temporal dimension that further complicates model comparison.

The Temporal Mismatch Problem

Along with this spatial and methodological complexity, there is a temporal mismatch between how climate scenarios are developed and how hazard-loss models are applied in practice. Traditional catastrophe modeling has been designed for the insurance sector, where annual renewals and short-term portfolio management are standard. These models usually rely on historical event catalogues, supplemented by synthetic events, calibrated so that hazard frequency and severity align with observed statistics from recent decades.

Banking supervisors, however, now expect firms to assess physical risk over time horizons aligned with mortgage books and long‑dated asset lifetimes — often 20–30 years or more — using scenarios such as those provided by the Network for Greening the Financial System (NGFS). NGFS Phase V scenarios describe future climate and macroeconomic pathways in terms of temperature trajectories, emissions profiles, policy assumptions and economy‑wide damage estimates, including potential GDP losses by mid‑century under current policies. They do not, by design, specify how a "1.5°C world in 2040" translates into a different probability distribution for a 1‑in‑100‑year flood at a particular location.

To bridge that gap, vendors must make explicit methodological choices. They might, for example, adjust historical storm intensities using temperature–precipitation relationships derived from climate models, dynamically downscale global circulation models to regional weather fields, or apply pattern‑scaling techniques based on thermodynamic constraints. Each approach can materially change the projected frequency and severity of extremes, and academic literature does not yet offer a clear consensus on which methods are best suited to the time horizons needed for banking portfolios.

There is also a second layer of potential inconsistency. NGFS scenarios use top-down integrated assessment models and macro‑economic damage functions to estimate aggregate GDP impacts under different climate pathways. Vendor physical risk models, in contrast, typically work bottom‑up from property‑level hazard and damage estimates. If banks derive asset‑level risk metrics from one modeling chain but benchmark their business models and disclosures against macro‑level NGFS outputs produced by another, there is an implicit assumption that these two chains are methodologically compatible. Without systematic testing of that assumption, there is a risk that reported numbers are internally inconsistent.

Diagram 2: The five-domain modeling cascade showing how methodological choices compound across weather, hydrology, hydraulics, damage functions and forward-looking scenarios

What Banking Model Governance Needs to Do

Traditional vendor model governance — requesting documentation, checking for coding errors, and running parallel calculations — is not sufficient on its own to address this complexity cascade. Most banks do not have internal teams spanning meteorology, hydrology, hydraulics, structural engineering, and climate scenario analysis, and cannot exhaustively rebuild or revalidate every component. Governance therefore needs to shift from trying to “redo the model” to making the modeling chain transparent enough that firms can understand, compare, and challenge what they are buying.

Several elements are particularly important:

Choice architecture
Banks should expect a clear map of which models are used at each stage of the chain — hazard, hydrology, hydraulics, damage and scenario translation — and why.
Assumption dependencies
Vendors should explain how decisions in one module constrain or drive choices in subsequent modules. If synthetic storm catalogues are adjusted using specific temperature-precipitation relationships, the empirical basis and uncertainty ranges for those relationships should be documented.
Divergence quantification
Sensitivity analysis is essential. Vendors should show how reasonable alternative choices at different points in the cascade — such as alternative hydrological formulations, hydraulic schematizations or damage curves — alter key outputs. This helps users understand whether differences between providers are driven mainly by upstream hazard modeling, exposure and vulnerability assumptions, or downstream aggregation and metrics.
Limitations of use
Clear statements of where models are and are not reliable are particularly important for global institutions operating across many jurisdictions. Vendors should set out any constraints by geography, hazard type, asset class or time horizon, and specify cases where the model components are being used outside their original design scope.
Temporal uncertainty and scenario construction
Vendors need to describe how they construct forward‑looking projections from backwards-looking models, and how they connect to NGFS or other supervisory scenarios. That includes explaining where there is robust physical or statistical evidence, and where the projections are more speculative, especially over longer horizons when internal climate variability becomes significant.
Monitoring and backtesting
Finally, effective governance requires ongoing performance assessment. As new extremes occur — for example, events where a year’s worth of rain falls in a single day in a major European city — vendor models should be tested against observed impacts and updated if material discrepancies emerge. Banks can use those events as structured case studies in their model risk frameworks.

For each key choice, vendors should document known strengths and limitations, including specific circumstances where the model may be less reliable — for example, acknowledging where a given weather system underrepresents short‑duration convective extremes, or under what conditions a one-dimensional hydraulic approach becomes unreliable due to extensive flow outside a river's banks (overbank flow).

If damage curves assume particular construction types or adaptation levels, vendors should make their assumptions about building stock composition and protection standards explicit.

Parting Thoughts: From Disclosure to Capital‑Relevant Risk Management

GARP's benchmarking work shows that vendors are actively developing tools to meet banks’ needs, but also that dispersion across providers remains high. Until governance frameworks explicitly address the complexity cascade — and focus on transparent methodological choices, sources of divergence and temporal uncertainty — physical climate risk is likely to remain more prominent in narrative disclosures than in capital calculations under Pillar 2 or, ultimately, Pillar 1.

Addressing this gap will also require a cultural shift for some vendors. In many parts of banking, competitive advantage has not rested on proprietary models — core derivatives pricing approaches, for instance, have long been documented in the public domain — but on implementation quality, client service, and trusted risk partnerships. Physical risk vendors that embrace transparency and treat governance as an enabler of better decisions rather than a compliance hurdle are likely to become preferred partners as financial institutions develop climate-aware risk management capabilities.

Over time, vendors that clearly document their modeling choices, quantify divergence drivers, and articulate limitations should see their outputs used more broadly in risk identification, portfolio management, and capital planning. Those that continue to treat their models as “black boxes” may find their role limited to high‑level disclosure purposes, where numbers are visible but not easily integrated into daily risk frameworks.

David K. Kelly is Chief Science Officer at MKM Research Labs and author of Weather Patterns to Physical Risk Swaps and Handbook of Model Risk Management for Vendors. He has 30 years of experience in investment banking, including senior positions in front office and risk at global systemically important banks.

View full post