Problems with LLMs


Prompt Engineering

This is a dangerous thing to do – someone asks a question, and we massage the question to give a predetermined answer. It may be well-intentioned, and the answer may be relevant to the question to start with, but inevitably the massaging will drift away from the intent of the question.

Absorbing Local Information

We set up a test case, where the description of a manatee (a “sea cow” weighing up to 900 kg) was changed to be a cat, and then questions were asked that could be answered by the information supplied. After promising

it proceeded to ignore the text it had been given. You might have imagined that it would have seen the body weight and turned to information on big cats (a tiger can weigh 300 kg), but it stuck resolutely to domestic cats (a good demonstration of how brainless LLMs are).

The straight out lying on what it is going to do is a bit of a worry – how do you trust it after that?

The Rise of a New Idea

The LLM is trained by letting it scour text on the internet and build chains for word propinquity. The problem with that is that existing ideas predominate. A new idea might come along, but based on word propinquity, it will never get a look in. People might read the new idea and recognise its import – an LLM is never going to do that – it doesn’t work on meaning. Taken to the extreme, mass use of LLMs would herald a new Dark Ages, where nothing ever changes.

Complex Text

Complex text such as legislation or project specifications is structured to help people find their way around. Things like

        For the purposes of this Act, a person covered by paragraph (c), (d) or (e) is taken to be an employee of ASD.

         information obtained by an authorised officer under Part 13, 14 or 15;

        and includes FTR information (within the meaning of the Financial Transaction Reports Act 1988).

        eligible gaming machine venue has the meaning given by section 13.

         (a)  is covered by item 31 or 32 of table 1 in section 6;

(from Anti-Money Laundering Act)

are par for the course – these don’t look like useful input for methods which work on word propinquity.

The size of the documents is another problem. A piece of legislation can run to a thousand pages, and the claim that no-one has ever read it in its entirety is entirely believable. The specification for the F=35’s undercarriage runs to 3000 pages (a DoD project where wastage in the hundreds of billions occurred, through inability to understand what the specification was saying – the specifications were hugely exceeding a human’s comprehension limit, so wastage was the only way to understand what was required).

Standalone Document

Some documents have a long life. There is no point confabulating the specification for Hunter class frigates (starting to be built “real soon now”) with frigates designed years ago, nor, when the project may take 10 years, to try to blend 2022 technology with all the whiz bang ideas for Anti-Submarine Warfare in 2030.

More Is Better

The current notion is that the problems of LLM can be fixed by making the data collection phase larger – a hundred times larger, or a thousand times larger, or (whisper it) a million times larger. This is extremely unscientific – there won’t just be diminishing returns, but negative returns. What is required is understanding the meaning of words, so there is “understanding”, not just statistics.

When the word “unprecedented” is being thrown around so casually, we need to think about what we are doing, not fall back on something that promises “no need to think”. The world is changing under our feet.

Word propinquity used in LLMs has severe limitations.




Popular Posts