Gary Marcus has criticised LLMs on the basis that they have
the rules of chess in their data, but cannot play chess. Is this a fair comment?
The rules of chess explain the legal moves for the pieces, and a few other specialised moves, such as castling and en passant. The rules do not explain how to play chess – the start, the midgame, the endgame, the shifting strategy.
It is unlikely that a person could read about millions of
possible games and learn how to play chess that way (while a machine can do
that easily). Instead, they absorb strategy by losing, drawing, and eventually winning
games – they learn to make moves mostly unconsciously, while reading about a
limited set of gambits and manoeuvres. LLMs have no comparable mechanism. It
might be thought that Machine Learning could do this, but ML consists of a
programmer “training” a network of directed resistors, the network being
completely inadequate to handle the tens of thousands of states involved in the
process of a game. There is no ability for ML to extend its structure. But it
is only an eight by eight array – where do the tens of thousands come from? A
piece (not a pawn) can occupy any square on the board, and do so in company
with other pieces, working toward a strategy, which may cease to be tenable if
one piece is taken.
As training for military strategy, chess is very weak – the best
military strategists break the rules (the classic example is the Maginot Line).
The point remains – reading about chess being played at a high level will not
help someone to play at a high level, other than to give a glimpse of the sorts
of mental structures required.
How much is language like chess – do you need to speak or
read a language to learn it? If a dictionary is the source of a machine’s knowledge
about English, it has an obvious limitation in the brevity of its entries. How
will a machine become competent in handling dense text?
By observing how a person breaks down complex text into
objects that the machine uses – words, phrases, clauses. Sometimes there is
insufficient information to be certain, so a decision is left until more
information is available, or, if needs must, a decision is made based on
whatever information is available (the machine can use its existing structure
to simulate what the outcome might be, so the decision can be far from static).
No, we are not offering something right out of the box that
knows everything it will ever need – new words appear, and the meanings of existing
words change, for a start. What we are offering is something that can grow into
whatever role is chosen for it. The important thing is it does not have the Four Pieces Limit, which plagues