Why Does Language Do As Well As It Does?
Given that:
1.
Words can have multiple parts of speech – “on”,
“bar”
2. Words can have many meanings- “bar”,
“cut”, “run”
3. Prepositions change verb meaning –
“cut up”, “cut off”, “cut down”
4. Phrases provide allusions, metaphors,
idiom – “raise the bar”
5. Migrants come from other lands, where
they learnt other facts in other languages
The ability
of language to be reasonably error-proof is extraordinary. Of course, the error
rate climbs in an emergency or war.
But we are
using a dictionary containing a million words – won’t that be better?’
Dictionaries
aren’t very useful – they give you the sense of a word using other words, but
those other words can have many meanings, so you can’t be sure what the
definition is. In 100,000 senses from a dictionary, there is just one that attempts to clarify
what is meant for one of the words in the definition.
Description: crass
Category: Adjective
Sense count: 4
Sense 0
Definition: Coarse; crude; unrefined
or insensitive; lacking discrimination (in the third sense of the word)
When there
are words like “coarse” (2 senses, 4 subsenses) or “insensitive” (35 synonyms,
with no clarity as to the sense of the synonym), all you end up with is a vague
mess.
A dictionary
for human use will give a long list of wors that are synonyms (through one of
their senses), but not tell you which sense is meant. – “insensitive” has
“hard” as a synonym. The actual reference should be:
Synonym 18:
Hard Definition: (of a
person) not showing any signs of weakness; tough (“tough” obviously needs
clarification)
Part of what
we are doing is cleaning up the definitions for words so that it is clear which
sense of the component word is being used in the definition of the word – we
end up with a precise web of words that can be searched – it should be more
like what a human has, and less like a dictionary.
The Oxford
Dictionary has domains in which a word operates – these were never thought out
in detail, but they need to be.
Anti-Money
Laundering legislation has civil law penalties and criminal law penalties. Civil
law has “balance of probabilities”; criminal law has “beyond reasonable doubt”
– they can’t both be true at once unless they are in separate logical shells - making
the logical structure that describes them more complicated.
Figurative Allusion

Consider the
allusion to raising the bar of a high jump frame:
Sense 0
Definition: to raise the bar on a high jump
frame.
Sense 1
Definition: To raise standards or
expectations, especially by creating something to a higher standard
Example 0: Taiwan Semiconductor
raised the bar on track widths at the conference (they raised the bar so far
that no-one can follow them).
Similar
sentences:
Fred raised (the bar on forever
chemicals) at the meeting. – raising the subject of an abstract prohibition
Fred raised the bar (of the barbell) to
his chin, but could go no higher.
The door was barred. Fred raised the bar
and opened the door.
The person
may have heard the bar referred to as a prohibition (one reason why synonyms need to be spotless), or not consider forever
chemicals to be a suitable object on which “to raise the bar”. Raising the bar in
the figurative sense is improving human activity, or a product of human
activity. Forever chemicals are bad, so raising the bar on them does not make
any sense. It is this apparently simple reasoning we need to automate.
Is there any need to rub it in - LLMs are completely unsuited to this sort of problem, where knowing the meanings of words is paramount.
A person has
a very large store of things that they have heard, seen or read, and are willing
to accept the meaning used by someone they respect or trust. A dictionary of a
million words comes off as second best because of its flatness. We have built a
structure which represents all the meanings of all the expected words (about
50,000). More reasoning about how words fit together is needed to match a
human.
More examples to follow.


Comments
Post a Comment