Generative AI has arrived with breathtaking speed. Each new model is larger, more capable, and more powerful than its predecessor. Benchmarks are shattered, context windows extended, and outputs that once seemed impossible now appear routine.
But anyone in risk management knows that scale alone is not the measure of trust. Just as in cities where skyscrapers can rise faster than residents can count them, it is not the height of the tower that makes it livable. It is the foundation, the plumbing, the wiring, and the finishing touches that make the building safe for daily use.
The same applies to AI in regulated environments. Demos may impress, but the true test of generative AI is not whether it can generate fluent text or realistic images. It is whether the system is defensible, reproducible, and reliable under scrutiny, particularly as the European Union’s Digital Operational Resilience Act (DORA) obligations and the AI Act compliance deadlines take effect.
In housing, the observable qualities are obvious: curb appeal, large rooms, modern appliances. These are easy to measure and easy to market. AI has its equivalent: benchmark scores, fluent outputs, multimodal features, and sleek interfaces. These are what make headlines and drive personal adoption, where individuals can catch and correct errors themselves.
But the non-observable qualities, those you only experience by living in a house day after day, are what ultimately matter. Is the plumbing sound? Does the wiring hold? Are the walls insulated, the air quality safe, the foundation solid?
In AI, these non-observable qualities are reliability, reproducibility, auditability, and resilience. They do not appear in a demo, but they determine whether the system can be trusted at scale. Under the U.K. Prudential Regulation Authority SS1/23 principles, these qualities must be embedded across the model risk management (MRM) lifecycle with clear accountability to named Senior Management Function holders.
A house with leaking pipes is uninhabitable. In AI, the plumbing is how data flows into the system: how information is chunked, retrieved, and grounded.
Chandrakant Maheshwari
Structure-preserving chunking ensures rules are segmented into clauses, tables, and definitions with anchors back to the source. Cross-references must travel with chunks, or meaning is lost. Data lineage must be clear, so every output maps to a traceable origin, a core requirement under DORA's ICT risk management framework and the EU AI Act’s data governance obligations.
Privacy by design must be embedded at this layer: Minimization of personally identifiable information (PII), jurisdictional storage constraints, vendor "no-retain" clauses, and Data Protection Impact Assessments (DPIAs) for high-risk processing. Without sound plumbing that respects both factual grounding and data protection principles, the risk of inaccuracy and non-compliance seeps through the system.
Good wiring powers the house; bad wiring sparks fires. For AI, wiring is the governance framework that channels energy safely.
Hybrid retrieval acts as circuit breakers, filtering by jurisdiction, dates, and versions. Confidence thresholds ensure the system abstains when evidence is weak, a model risk mitigant explicitly recognized in SS1/23 Principle 5. Bounded variability and replay-ability replace any illusion of strict determinism: Temperature-controlled LLMs are not deterministic, but fixed model versions + fixed prompts + fixed retrieval snapshots + logged seeds enable reproducible outputs and explainable variance bands that auditors can validate.
Strong wiring, with clear accountability under the Senior Managers & Certification Regime, gives risk managers confidence that outputs will not ignite unintended consequences.
A doorknob may seem small, but if it wobbles, the entire house feels cheap. In AI, small user-facing details build or erode trust:
Sources and effective dates should be visible in every output, meeting transparency obligations under EU AI Act Article 13 for high-risk systems. Responses should adapt by roles like analyst, auditor, executive etc., without altering citations. Tables and thresholds must be retrieved precisely, not paraphrased.
Marketing claims must be evidence-backed and consistent with actual capability. The SEC has already sanctioned firms for AI-washing; a pre-publication claims verification process is essential to avoid regulatory action and reputational damage.
Trust is built in these details. Without them, adoption falters.
Every livable house has secure storage. In regulated AI, the records vault preserves business communications, model outputs, and audit trails to meet strict retention requirements.
SEC Rule 17a-4 mandates that broker-dealers preserve business communications and records with immutability (Write Once, Read Many storage), defined retention periods, and full searchability. Your audit-replay log capturing prompts, responses, retrieval results, model versions, and seeds must be designed to meet these standards.
Under DORA, ICT-related incidents must be classified, logged, and reported to supervisory authorities with comprehensive records. The records vault is not optional infrastructure; it is the foundation of audit readiness and regulatory compliance.
The unseen parts of a house, insulation, vents, joists define livability. In AI, subtle safeguards define defensibility.
Versioning ensures only rules valid at the relevant time are surfaced. Exception- and negation-aware retrieval prevents misapplication. Numeric- and unit-sensitive matching ensures thresholds are applied correctly. Every claim must map to a cited clause, or the system abstains.
Supply-chain integrity extends these controls to your RAG (Retrieval-Augmented Generation) corpus: signed documents, source allow-lists, canary documents to detect unauthorized access, adversarial scans for retrieval poisoning, and roll-back capabilities for tainted indices. Under DORA's third-party oversight requirements, the security of critical ICT service providers and the integrity of their data pipelines are direct compliance obligations.
These invisible mechanisms are what auditors, independent validators under SS1/23 Principle 4, and regulators will test.
Even the best houses degrade without upkeep. Roofs leak, paint fades, systems drift. AI is no different.
Change management requires refreshing embeddings and indexes when policies update, with version control and impact assessment. Golden-set evaluations stress-test the system against adversarial queries, documenting performance within defined variance bands. Human-in-the-loop review escalates low-confidence answers for expert validation.
Under DORA, financial entities must conduct annual reviews of their ICT risk management frameworks and business continuity policies. SS1/23 mandates ongoing model monitoring, with reports to the audit committee on MRM effectiveness. Maintenance is what keeps an AI system reliable long after its launch – and is what keeps it compliant as regulations evolve.
Governance Accountability: Who Owns What
This is the deeper truth risk professionals must hold on to. Skyscrapers rise fast; their exteriors transform skylines in months. But the interiors, the plumbing, wiring, fixtures, inspections take far longer. That is the work that makes a building habitable.
Generative AI is being built faster than many professionals can track. Each release appears more advanced than the last. But until the finishing work is done, and until auditability, lineage, controls, and safeguards are in place, these systems remain what they currently are: magnificent shells. Beautiful façades, but in regulatory terms, walking zombies.
The pace of model releases should not provoke panic. No one moves into a skyscraper the day its exterior is complete. Risk and compliance professionals own the slower, indispensable work of finishing the details that make AI safe, defensible, and trustworthy.
The future of AI in regulated industries will not be decided by the size of the models or the brilliance of their outputs. It will be decided by the smallest, least glamorous details: the invisible craftsmanship that makes systems reliable every day.
A house with a glossy façade but faulty plumbing is unlivable. Likewise, an AI model with dazzling demos but weak foundations is unusable at scale.
The House of AI will stand or fall, not on its grandeur, but on the strength of its smallest details. For risk professionals, this is both the challenge and the opportunity: to ensure the finishing work is done, so that the impressive skyline becomes a safe, sustainable place to live.
Chandrakant Maheshwari is a lead model validator at Flagstar Bank with over 20 years of experience in financial risk management and financial crime prevention. He is a course author for ACAMS and focuses on ensuring that advanced analytics and generative AI models are used responsibly in regulated financial environments. Chandrakant is the author of two forthcoming books on model risk and financial crime model validation, to be published by Elsevier and Springer in 2026. He has a blog at https://chandrakant721.wordpress.com/