Large Language Models
Google is touting its Large Language Model, PaLM, with a
claimed 540 billion nodes.
It learns by reading text.
A few problems.
Firstly, complex text – a piece of legislation, a high value
contract or a Defence specification – has its own definitions, in a glossary or
declared somewhere in the document. Words or phrases defined this way are meant
to override the common meanings.
state of mind of a person includes:
(a) the knowledge, intention, opinion, suspicion,
belief or purpose of the person; and
(b) the person’s reasons for the intention,
opinion, belief or purpose.
Secondly, complex text relies heavily on bullets, with
references to a particular bullet capturing the phrase or clause it links to.
(b) a regular premium
policy to which paragraph (a) does not apply
corporate group has the meaning given by subsection 123(12).
In other words, a complex text document is its own thing,
and no amount of reading other texts will help to understand the meanings in it.
LLM doesn’t seem aimed at complex text. What sort of text is
it aimed at?
“…the idea is that
we will try to attack this problem very directly, this problem of few-shot
learning, which is this problem of generalizing from little amounts of data. (that is what a
dictionary is – might be a good place to start)
…the main idea in
what I’ll present is that instead of trying to define what that learning
algorithm is by N and use our intuition as to what is the right algorithm for
doing few-shot learning, but actually try to learn that algorithm in an
end-to-end way.
And that’s why we
call it learning to learn or I like to call it, meta learning.”
The goal with the few-shot approach is to approximate
how humans learn different things and can apply the different bits of knowledge
together in order to solve new problems that have never before been
encountered.
The advantage then is a machine that can leverage all
of the knowledge that it has to solve new problems.
As soon as someone mentions “algorithm” in a
discussion of language, it should raise a red flag. The meaning is in the words,
and how they connect together.
The obvious question is – Instead of reading random
text to learn things, why not read a dictionary? A dictionary encapsulates
hundreds of man-years of effort.
A dictionary:
May use other words to help define a
word.
Hilarious -> extremely
amusing
May use grammatical objects to give a broader
definition of a word
Of Sub Sense 1
Definition: followed by a noun
expressing the object of the verb underlying the first noun – the owner
of the boat
May
indicate the purpose of the word is not to add meaning, but emphasis.
(used for emphasis) only to a
small extent; not much or often
May
indicate the purpose of the word is figurative or hyperbolic.
Barnacle Sense 1
Figurative: Sense 0
Definition: a tenacious person or thing
Freezing
Sub Sense 0
(used hyperbolically) very cold
In other words, a dictionary would appear to be the obvious
thing to read to gain knowledge of a language.
Currently, we have 20,000 files containing definitions from
the OED, 70,000 definitions, and 700,000 words in those definitions, supporting
a vocabulary of 45,000 words and 15,000 wordgroups (combinations of words – “bank
account”). As a rough rule, each word will use ten network elements, so seven
million network elements, leaving thirty million elements to handle a
particular problem. The system has the ability to look up a word it doesn’t
know, and merge the new definitions into its structure. It also has to fabricate
definitions for comparative and superlative adjectives and gerunds “on the run”.
Dictionaries have a bad rap about circularity. The machine
can carry out instructions in English to clean itself in different ways.
Google has had great success with a huge database using key
word searching. Knowing that, it is not surprising that they thought a huge
structure would be useful for language. Why would anyone else think that?
Comments
Post a Comment