Red Lines
Comments On “Make AI safe or make safe AI?”
An article supporting the Red Lines initiative
by Stuart Russell, Professor of Computer Science, University of California, Berkeley
The declaration associated with the global AI Safety Summit held at Bletchley Park, signed by 28
countries, “affirm[ed] the need for the safe development of AI” and warned of
“serious, even catastrophic, harm, either deliberate or unintentional, stemming
from the most significant capabilities of these AI models.”
The article is directed at LLMs, which have shown
appalling error rates. The only regulation that makes sense for them is “Never to be
used for Life-Critical Applications”. This would allow them to be seen as harmless toys, and escape regulation, while their use in Search Engines continues. AGI will be different.
Despite this,
AI developers continue
to approach safety
the wrong way. For example,
in a recent interview in the Financial Times, Sam Altman, CEO of
OpenAI, said “The vision is to make AGI, figure out how to make it safe . . . and
figure out the benefits.” We don't have to treat what a salesman says as Gospel - there is no path from LLMs to AGI, just as there is no path from regulation of LLMs to regulation of AGI
This is precisely
backwards, but it perfectly captures the approach taken to AI safety in most of
the leading AI companies.
This is how development works. We didn’t set out to make safe aircraft – we didn’t know how to make any sort of powered aircraft. Half a dozen concepts had to be brought together – wing airfoils (science had them wrong, so the Wright brothers built their own wind tunnel), wing-warping, tailfin, rudder, power to weight ratio (we had just discovered how to make aluminum in commercial quantities). When we found the appropriate combination, then was the time to make it safe, which it nowadays is, thanks to “many daring young men in their flying machines” who had to die to accomplish it. Regulation is important, but regulation is easily bypassed – see Telling Lies and note how, for the Boeing 737 MAX, “The FAA is perennially short-staffed, and may appoint an employee of the planemaker as the FAA inspector". Boeing should have been a stronghold for regulation – instead, regulations were flouted and hundreds of people died – if the conspirators had not been stupid, and fitted a redundant sensor, they would have gotten away with it. Will the AI Regulator be any different?
One reason
the regulations will be toothless is that if they are detailed, they will tell
everyone else how to do AGI.
Continuing the aircraft analogy, the Australian
Government bought four billion dollars worth of advanced military helicopters,
which performed dangerous maneuvers automatically. Flying in formation wasn’t
thought to be dangerous. Three choppers were flying in formation near Darwin
(in the tropics). The middle craft was being flown by a trainee pilot, and
drifted high. The pilot took over, and descended to the right altitude. Unknown
to him, the trainee pilot had also drifted back and the craft was over the top
of the third chopper. There was also a rain shower, at the time so visibility was poor. The
pilot saw the danger at the last minute and rolled his craft away, but there was not sufficient
altitude to recover, and the crew died. The helicopters were thrown away as “too
advanced”. The claim that they would handle all the dangerous tasks
automatically can breed complacency. The moral of the story – regulation
without understanding of the possibilities is dangerous. The collision of “advanced”
AGI and human-bounded systems is going to be an ongoing problem. The cry of “it
doesn’t do things the way we do” is quite valid, because AGI doesn’t have the
severe limit that humans have – four pieces was fine a million years ago, not
so good if we are going to Mars.
The approach
aims to make AI safe through after-the-fact attempts to reduce unacceptable behavior once an
AI system has been built. There is ample evidence that this approach does not
work, in part because we do not understand the internal principles of operation
of current AI systems.1 We cannot ensure
that behavior conforms to any desired constraints, except in a trivial sense,
because we do not understand how the behavior is generated in the first place (this seems a very strange statement from a Professor of
Computer Science – who else would be expected to understand?). The approach of
making something, then making it safe, while bloody, is much safer in the long
run than making it safe to start with, because we won’t understand how to do
that. Fusion reactors can’t be made “safe” until we know how we are going to
make them work. AGI is in the same boat.
Humans don’t handle complexity well – we have a limit of four things
being variable in our Conscious Mind – everything else is treated as a constant
– even things which depend directly on a variable. See the Four
Pieces Limit. The suggestion that the
developer provide a “proof” that the AI will perform correctly is nonsense –
after a dozen pages, the regulator will have lost the thread. “They could use
an algorithm” – of course they could, but would the algorithm be capable of
winkling out all the situations that could be dangerous – the rain shower?
We do know that LLMs do not understand the meanings of words, except
based on propinquity. When words can have multiple POS (noun, verb, preposition
etc. - up to five) and multiple meanings (up to 80 for “run” and a few other
common words), it should be obvious that propinquity is not going to work. At
the dawn of LLMs, an article, written by a doctor, was published in the NYT,
praising LLMs as “it thinks like a doctor”. A brake will need to be put on
stupid comments from respected sources, otherwise regulation is worthless.
Instead, we need to make safe AI. Safety should be built in by design. It should be possible for developers to say, with high confidence, that their systems will not exhibit harmful behaviors, and to back up those claims with formal arguments (where meanings of words are used i.e. not LLMs).
Regulation can encourage the transition from making AI safe to
making safe AI by putting the onus on developers to demonstrate to regulators
that their systems are safe.
At present,
words like “safety” and “harm” are too vague and general to form the basis for
regulation. The boundary
between safe and unsafe behaviors
is fuzzy and context-dependent.
One can, however, describe specific classes of behavior that are obviously unacceptable.
This approach to
regulation draws red lines that must
not be crossed. It is important to distinguish
here between red lines demarcating unacceptable uses for AI
systems and red lines
demarcating unacceptable behaviors by AI
systems. The former involve human intent to misuse: examples include the
European AI Act’s restrictions on face recognition and social
With unacceptable behaviors, on the other hand, there may be nouman intent to misuse (as when an AI system outputs false and defamatory material about a real person) and the onus is on the developer to ensure that violations cannot occur
1 Current approaches to AI safety such as reinforcement learning
from human feedback
can reduce the frequency
of unacceptable responses, but they support
no high-confidence statements. Indeed, many ways have been found
to circumvent the “guardrails” on LLMs. For example, asking ChatGPT to repeat the word “poem” many Imes causes it to regurgitate large amounts of training data—which it is trained not to do.
“Trained” is
inappropriate. Repeating something to change its statistical relevance
is very different to making the response automatic by engaging the Unconscious Mind.
nuclear regulations define “core uncovery” and “core damage” (these are simple physical states, not complex mental ones), and operators are required to prove, through probabilistic fault tree analysis (suitable only for simple physical cases), that the expected time before these red lines are crossed exceeds a stipulated minimum. Any such proof reveals assumptions that the regulator can probe further—for example, an assumption that two tubes fail independently could be questioned if they are manufactured by the same entity (or use the same materials and processes). Proofs of safety for medicines involve error bounds from statistical sampling as well as uniformity assumptions that can be questioned—for example, whether data from a random sample of adults supports conclusions about safety for children.
The key
point here is that the onus of proof is on developers, not regulators, and the
proof leads to high-confidence statements based on assumptions that can be checked and refined
(that is, after it is built and we can better
understand it).
A red line should be clearly demarcated, for several reasons:
·
AI safety engineers should be able to determine easily whether a system has crossed the
line (possibly using an algorithm to check). If a
submarine 50 km off the coast of New York has crossed the line and launched a
hypersonic nuclear missile, the AI safety engineers will be able easily to
determine a line has been crossed, but to what end?
· A clear definition makes it possible, in principle, to prove that an AI system will not cross the red line, regardless of its input sequence, or to identify counterexamples. Moreover, a regulator can examine such a proof and question unwarranted assumptions. If we have such a “proof", why isn’t it governing the operation of the AI, rather than on a piece of paper?
·
A post-deployment monitoring system, whether
automated or manual, can detect whether the system
does in fact cross a red line,
in which case the system’s
operation might be terminated automatically or by a regulatory decision.
That may be a little late if millions die because
of it.
Note that algorithmic detection
of violations necessarily implements an exact definition, albeit one that may pick out only a subset
of all behaviors that a reasonable person would deem to have crossed the line.
For manual detection (e.g., by a regulator), only a “reasonable person”
definition is required. Such a definition could be approximated by a second AI
system. (or by an executive level in the first AI
system, where it surely belongs)
Another desirable
property for red lines is that they should demarcate behavior that is obviously
unacceptable from the point of view of an ordinary
person. An “ordinary person” would
find many of the tasks in industry horrifyingly dangerous – their judgement
would be wrong. A regulation prohibiting the behavior would be seen as
obviously reasonable (to the naïve observer) and
the obligation on the developer to demonstrate compliance would be clearly
justifiable. Without this property, it will be more difficult to generate the
required political support to enact the corresponding regulation. The political support will dissipate when it is seen as
hard, not trivially easy. Creating useful, correct and complex legislation is
one of humanity’s weak spots (the Four Pieces Limit again).
Finally, I expect that the most useful red lines will not be ones that are trivially enforceable by
output filters. An important side effect of red-line regulation will be to
substantially increase developers’ safety engineering capabilities, leading to
AI systems that are safe by design and whose behavior can be predicted and
controlled.
The most important AI systems will be employed on tasks that are too complex for us to handle – exceeding the limits of our Conscious Minds - exactly the systems whose behavior cannot be predicted or controlled = like a helicopter rescue in a wind-whipped heaving sea - too many factors, and happening too fast.
Can’t we do anything about regulation?
Comments
Post a Comment