Red Lines

 Comments On Make AI safe or make safe AI?

 An article supporting the Red Lines initiative

by Stuart Russell, Professor of Computer Science, University of California, Berkeley

The declaration associated with the global AI Safety Summit held at Bletchley Park, signed by 28 countries, “affirm[ed] the need for the safe development of AI” and warned of “serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of these AI models.”

The article is directed at LLMs, which have shown appalling error rates. The only regulation that makes sense for them is “Never to be used for Life-Critical Applications”. This would allow them to be seen as harmless toys, and escape regulation, while their use in Search Engines continues. AGI will be different.

Despite this, AI developers continue to approach safety the wrong way. For example, in a recent interview in the Financial Times, Sam Altman, CEO of OpenAI, said “The vision is to make AGI, figure out how to make it safe . . . and figure out the benefits.” We don't have to treat what a salesman says as Gospel - there is no path from LLMs to AGI, just as there is no path from regulation of LLMs to regulation of AGI

This is precisely backwards, but it perfectly captures the approach taken to AI safety in most of the leading AI companies.

This is how development works. We didn’t set out to make safe aircraft – we didn’t know how to make any sort of powered aircraft. Half a dozen concepts had to be brought together – wing airfoils (science had them wrong, so the Wright brothers built their own wind tunnel), wing-warping, tailfin, rudder,  power to weight ratio (we had just discovered how to make aluminum in commercial quantities). When we found the appropriate combination, then was the time to make it safe, which it nowadays is, thanks to “many daring young men in their flying machines” who had to die to accomplish it. Regulation is important, but regulation is easily bypassed – see Telling Lies and note how, for the Boeing 737 MAX, “The FAA is perennially short-staffed, and may appoint an employee of the planemaker as the FAA inspector". Boeing should have been a stronghold for regulation – instead, regulations were flouted and hundreds of people died – if the conspirators had not been stupid, and fitted a redundant sensor, they would have gotten away with it.  Will the AI Regulator be any different?  

One reason the regulations will be toothless is that if they are detailed, they will tell everyone else how to do AGI.

Continuing the aircraft analogy, the Australian Government bought four billion dollars worth of advanced military helicopters, which performed dangerous maneuvers automatically. Flying in formation wasn’t thought to be dangerous. Three choppers were flying in formation near Darwin (in the tropics). The middle craft was being flown by a trainee pilot, and drifted high. The pilot took over, and descended to the right altitude. Unknown to him, the trainee pilot had also drifted back and the craft was over the top of the third chopper. There was also a rain shower, at the time so visibility was poor. The pilot saw the danger at the last minute and rolled his craft away, but there was not sufficient altitude to recover, and the crew died. The helicopters were thrown away as “too advanced”. The claim that they would handle all the dangerous tasks automatically can breed complacency. The moral of the story – regulation without understanding of the possibilities is dangerous. The collision of “advanced” AGI and human-bounded systems is going to be an ongoing problem. The cry of “it doesn’t do things the way we do” is quite valid, because AGI doesn’t have the severe limit that humans have – four pieces was fine a million years ago, not so good if we are going to Mars.

The approach aims to make AI safe through after-the-fact attempts to reduce unacceptable behavior once an AI system has been built. There is ample evidence that this approach does not work, in part because we do not understand the internal principles of operation of current AI systems.1 We cannot ensure that behavior conforms to any desired constraints, except in a trivial sense, because we do not understand how the behavior is generated in the first place (this seems a very strange statement from a Professor of Computer Science – who else would be expected to understand?). The approach of making something, then making it safe, while bloody, is much safer in the long run than making it safe to start with, because we won’t understand how to do that. Fusion reactors can’t be made “safe” until we know how we are going to make them work. AGI is in the same boat.

Humans don’t handle complexity well – we have a limit of four things being variable in our Conscious Mind – everything else is treated as a constant – even things which depend directly on a variable. See the Four Pieces Limit. The suggestion that the developer provide a “proof” that the AI will perform correctly is nonsense – after a dozen pages, the regulator will have lost the thread. “They could use an algorithm” – of course they could, but would the algorithm be capable of winkling out all the situations that could be dangerous – the rain shower?

We do know that LLMs do not understand the meanings of words, except based on propinquity. When words can have multiple POS (noun, verb, preposition etc. - up to five) and multiple meanings (up to 80 for “run” and a few other common words), it should be obvious that propinquity is not going to work. At the dawn of LLMs, an article, written by a doctor, was published in the NYT, praising LLMs as “it thinks like a doctor”. A brake will need to be put on stupid comments from respected sources, otherwise regulation is worthless.

 Instead, we need to make safe AI. Safety should be built in by design. It should be possible for developers to say, with high confidence, that their systems will not exhibit harmful behaviors, and to back up those claims with formal arguments (where meanings of words are used i.e. not LLMs).

Regulation can encourage the transition from making AI safe to making safe AI by putting the onus on developers to demonstrate to regulators that their systems are safe.

 

At present, words like “safety” and “harm” are too vague and general to form the basis for regulation. The boundary between safe and unsafe behaviors is fuzzy and context-dependent. One can, however, describe specific classes of behavior that are obviously unacceptable.

This approach to regulation draws red lines that must not be crossed. It is important to distinguish here between red lines demarcating unacceptable uses for AI systems and red lines demarcating unacceptable behaviors by AI systems. The former involve human intent to misuse: examples include the European AI Act’s restrictions on face recognition and social scoring, as well as OpenAI’s disallowed uses for ChatGPT such as generating malware and providing medical advice.  

With unacceptable behaviors, on the other hand, there may be nouman intent to misuse (as when an AI system outputs false and defamatory material about a real person) and the onus is on the developer to ensure that violations cannot occur

1 Current approaches to AI safety such as reinforcement learning from human feedback can reduce the frequency of unacceptable responses, but they support no high-confidence statements. Indeed, many ways have been found to circumvent the “guardrails” on LLMs. For example, asking ChatGPT to repeat the word “poem” many Imes causes it to regurgitate large amounts of training data—which it is trained not to do.

“Trained” is  inappropriate. Repeating something to change its statistical relevance is very different to making the response automatic  by engaging the Unconscious Mind.

Behavioral red lines are used in many areas of regulation. For example,

 nuclear regulations define “core uncovery” and “core damage” (these are simple physical states, not complex mental ones), and operators are required to prove, through probabilistic fault tree analysis (suitable only for simple physical cases), that the expected time before these red lines are crossed exceeds a stipulated minimum. Any such proof reveals assumptions that the regulator can probe further—for example, an assumption that two tubes fail independently could be questioned if they are manufactured by the same entity (or use the same materials and processes). Proofs of safety for medicines involve error bounds from statistical sampling as well as uniformity assumptions that can be questioned—for example, whether data from a random sample of adults supports conclusions about safety for children.

 

The key point here is that the onus of proof is on developers, not regulators, and the proof leads to high-confidence statements based on assumptions that can be checked and refined (that is, after it is built and we can better understand it).

 A red line should be clearly demarcated, for several reasons:

·        AI safety engineers should be able to determine easily whether a system has crossed  the line (possibly using an algorithm to check). If a submarine 50 km off the coast of New York has crossed the line and launched a hypersonic nuclear missile, the AI safety engineers will be able easily to determine a line has been crossed, but to what end?

·        A clear definition makes it possible, in principle, to prove that an AI system will not cross the red line, regardless of its input sequence, or to identify counterexamples. Moreover, a regulator can examine such a proof and question unwarranted assumptions. If we have such a “proof", why isn’t it governing the operation of the AI, rather than on a piece of paper?

·        A post-deployment monitoring system, whether automated or manual, can detect whether the system does in fact cross a red line, in which case the system’s operation might be terminated automatically or by a regulatory decision. That may be a little late if millions die because of it.

Note that algorithmic detection of violations necessarily implements an exact definition, albeit one that may pick out only a subset of all behaviors that a reasonable person would deem to have crossed the line. For manual detection (e.g., by a regulator), only a “reasonable person” definition is required. Such a definition could be approximated by a second AI system. (or by an executive level in the first AI system, where it surely belongs)

Another desirable property for red lines is that they should demarcate behavior that is obviously unacceptable from the point of view of an ordinary person. An “ordinary person” would find many of the tasks in industry horrifyingly dangerous – their judgement would be wrong. A regulation prohibiting the behavior would be seen as obviously reasonable (to the naïve observer) and the obligation on the developer to demonstrate compliance would be clearly justifiable. Without this property, it will be more difficult to generate the required political support to enact the corresponding regulation. The political support will dissipate when it is seen as hard, not trivially easy. Creating useful, correct and complex legislation is one of humanity’s weak spots (the Four Pieces Limit again).

 

Finally, I expect that the most useful red lines will not be ones that are trivially enforceable by output filters. An important side effect of red-line regulation will be to substantially increase developers’ safety engineering capabilities, leading to AI systems that are safe by design and whose behavior can be predicted and controlled.

 The most important AI systems will be employed on tasks that are too complex for us to handle – exceeding the limits of our Conscious Minds - exactly the systems whose behavior cannot be predicted or controlled = like a helicopter rescue in a wind-whipped heaving sea - too many factors, and happening too fast.

Can’t we do anything about regulation? 
Yes, we can get rid of impenetrable code, train the machine in English first, and then its role, and use English as the communication language between humans and the machine.     Then a lot more people can weigh in, including some with common sense. It would cost more in computer resources, but be immeasurably more reliable. 
There is a narrowness and naivety in the Computer Science approach that suggests that Complex Systems Engineers should be doing this work.

Comments

Popular Posts