AI Risk: The Inefficiencies of Existing Oversight, and What Risk Managers Can Learn From Science Fiction

Runaway artificial intelligence has been a major concern of science fiction at least since the 1909 publication of E. M. Forster’s The Machine Stops, but it took 114 years to get serious official attention.

On January 26, 2023, the National Institute for Standards and Technology released its AI Risk Management Framework. Many other documents followed, most recently President Biden’s October 30 executive order on Safe, Secure, and Trustworthy Artificial Intelligence. The next day, 28 countries and the European Union signed the Bletchley Declaration on AI Safety.

Aaron Brown

Unfortunately, none of these official documents or others I have seen focus on AI’s essential threats, and none incorporate professional risk management best practices. For the moment anyway, risk managers will do better to consult science fiction discussions than to rely on official standards.

Both the Bletchley Declaration and President Biden’s executive order emphasize the dangers of computer control of important functions. But this concern is many decades out of date. Since the mid-20^th century, computers have been flying our airplanes, controlling our power grids, operating our nuclear missiles, organizing financial trading, and generally running things. More recently, computer chips have been embedded in everything, from your microwave to your coffee maker – and increasingly they’re talking to each other.

The dangers of computer control are well known. Software bugs can result in inappropriate actions with sometimes fatal consequences. While this is a serious issue, it is a blind risk.

AI Risk Distinctions

AI poses a fundamentally different danger, closer to a malevolent human than to a misfunctioning machine. With AI and machine learning, the human gives the computers objectives rather than instructions. Sometimes these are programmed explicitly, other times the computer is told to infer them from training sets. AI algorithms are tools the computer — not the human — uses to attain the objectives. The danger from a thoughtlessly specified objective is not blind or random.

Consider Arthur C. Clarke’s famous HAL 9000 computer in the 1968 film, 2001: A Space Odyssey. HAL malfunctions not due to a computer bug, but because it computes correctly that the human astronauts are reducing the chance of mission success — its programmed objective.

Its plan to kill the humans is not blind, it’s premeditated. It cannot be caught by testing or diagnostics, because HAL knows to evade those things. Moreover, the damage is not limited. Even after HAL kills Frank Poole, the other astronaut, Dave Bowman, is not safe, and in principle HAL could go on to do any amount of damage until it is defeated like a human enemy.

This is quite distinct from a dumb computer program, where a human spells out the program’s desired response to all inputs. Sometimes the programmer makes errors that are not caught in testing. The worst errors are usually unexpected interactions with other programs rather than individual program bugs.

When software bugs or computer malfunctions do occur, they lead to random results. Most of the time the consequences are minor, and in any case the damage is limited. The bug might cause a crash or other disaster, but people notice those things and shut down the computer until the problem is fixed.

The other key risk distinction between dumb and smart programs is the damage of dumb programs is limited by what they control. The conventional computer controlling a nuclear power plant might cause a meltdown in the plant, but it can’t fire nuclear missiles or crash the stock market. An AI algorithm, on the other hand, could find ways to communicate with other computers to accomplish its objectives.

A Flawed Executive Order

President Biden’s executive order on AI risk has eight planks. The first calls for testing and care in deploying AI in life-critical systems, which are the tactics used for dumb programs, and precisely what are inadequate for AI. The remaining seven planks all require goals to be added to all AI implementations — such as to reduce unemployment, advance social equality and improve civil liberties.

These are all worthy goals, but not ones humans should delegate to machines. Moreover, the more goals added to an AI algorithm, the more chance of unexpected and perverse outcomes. For example, an AI algorithm for a self-driving car should work to get the driver to the destination safely, regardless of income level or employment status. No one wants a car that refuses to take a high-income person to work because doing so will increase income inequality – or that figures out the most direct way to reduce unemployment is to kill unemployed people.

The positive aspect of the NIST standards is they recognize that risk is two-sided — that the goal is not to minimize risk but to choose the optimal level for robust, positive innovation with controlled and survivable downside possibilities. But they insist this can be done by estimating probabilities and expected outcomes, what Nassim Taleb calls the “ludic fallacy.”

Uncertainty that can be quantified in the approach recommended by NIST can be handled without sophisticated risk management. However, radical innovations with unlimited opportunities and existential dangers cannot be risk-managed with expected value estimates. (In fact, ludic risk management fails even for everyday financial trading).

Remember, long-term outcomes are dominated by Black Swans — low-probability, high-impact events that occur because they are unexpected. If there were enough data and theory for reliable probability and impact estimates (per the NIST approach to AI), Black Swans would not exist.

To see the failure of risk management imagination in President Biden’s executive order on AI and in other government documents, compare them to what science fiction authors have produced. Isaac Asimov famously considered the problem in 1939 and proposed his three laws of robotics in 1942. His key idea was to build morality deeply into the structure of all AI.

Consider, for example, Asimov’s first rule: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.” This rule was intended as a basic safety feature to prevent obvious bad behavior. It is primary – whatever else the other laws or the AI’s goals are, it cannot violate this law.

While making morality a foundation of all AI seems like a self-evident good idea, Asimov’s stories deal with the many problems of this effort: the way even the simplest moral rules can lead to contradictions or perverse results, or can be exploited by malevolent humans or AIs.

Humans struggle to develop universal moral codes — for example, what is a self-driving car supposed to do if the only way to prevent a crash with a school bus that might kill dozens of people is to send itself with a single driver over a cliff? AIs will be making millions of decisions of this sort and even more consequential ones. But do we want them choosing without a moral code for guidance?

If only some AI have morality built-in, or if different implementations have different moral systems, we face the problem of intelligence emerging from combinations of AI entities. An emergent AI could pick and choose among different codes to find ways to do evil.

Asimov argued that AI moral laws must be simple, built into the foundations of the code and the same for all robots. Complex codes, he realized, can be twisted and rationalized, and safety features bolted on after-the-fact can be circumvented.

AI’s ability to copy us presents another potential problem. Human mental failings have survived millions of years of evolution, and modern AI can mimic nearly all of them – including paranoia, delusions, schizophrenia, monomania, greed, sadism, suicidal depression, and sociopathy. (Without a moral code, in fact, one could argue that all AI systems are designed to be sociopathic.)

What’s more, AI could also develop maladies that are too destructive to be evident in humans today.

The Importance of Manual Overrides

If we choose to require AI to conform to a moral code, we need to do that today, not wait until amoral AIs are embedded everywhere. Yet none of the official government or regulatory documents I’ve seen even consider this type of precaution.

Clarke took an engineering approach. His idea was that humans needed to ensure there were manual overrides to AI, outside the knowledge and control of AI systems. That’s how the David Bowman character in 2001 is able to outmaneuver HAL 9000, using physical door interlocks and eventually pulling the plug on HAL’s AI functions.

This also seems like a self-evidently good idea. My self-driving Tesla has something that looks like a steering wheel (actually, a steering yoke), but it has no physical connection to the wheels. If I twist it when the AI is controlling the car, the AI will turn off and the car will turn as I direct. But the AI chooses to obey that; turning the wheel only sends a signal to the computer, it does not actually affect the wheels. If I try to steer into a barrier, the car will not obey me. On the other hand, if my Tesla decided to kill me or run someone else down, I have no way to prevent it.

This is another example of a precaution that must be adopted now, as a fundamental design principle to be followed before turning over any system to AI. There must be manual overrides, and systems must have a plug that can be pulled, and AI algorithms should not be told about them.

Science fiction literature is rich with discussions of precisely these issues, with many AI ideas – and also many examples of how ideas might backfire. Risk managers would do better to peruse these stories than to rely on official pronouncements by people who don’t see the fundamental risk of AI, and who have little experience making consequential day-to-day risk decisions with clear, objective outcomes.

Aaron Brown worked on Wall Street since the early 1980s as a trader, portfolio manager, head of mortgage securities and risk manager for several global financial institutions. Most recently he served for 10 years as chief risk officer of the large hedge fund AQR Capital Management. He was named the 2011 GARP Risk Manager of the Year. His books on risk management include The Poker Face of Wall Street, Red-Blooded Risk, Financial Risk Management for Dummies and A World of Chance (with Reuven and Gabriel Brenner). He currently teaches finance and mathematics as an adjunct and writes columns for Bloomberg.

Topics: Risks & Risk Factors

2025 FRM Candidate Guide

2025 SCR Candidate Guide

2024-2025 RAI Candidate Guide

2024 Risk Careers Survey: Global Report

Article

AI Risk: The Inefficiencies of Existing Oversight, and What Risk Managers Can Learn From Science Fiction

Share

Trending