Why Does Language Do As Well As It Does?

 

Given that:
    1.    Words can have multiple parts of speech – “on”, “bar”

2.                 Words can have many meanings- “bar”, “cut”, “run”

3.                Prepositions change verb meaning – “cut up”, “cut off”, “cut down”

4.                Phrases provide allusions, metaphors, idiom – “raise the bar”

5.                Migrants come from other lands, where they learnt other facts in other languages

The ability of language to be reasonably error-proof is extraordinary. Of course, the error rate climbs in an emergency or war.

But we are using a dictionary containing a million words – won’t that be better?’

Dictionaries aren’t very useful – they give you the sense of a word using other words, but those other words can have many meanings, so you can’t be sure what the definition is. In 100,000 senses from a dictionary, there is just one that attempts to clarify what is meant for one of the words in the definition.

  Description: crass
  Category: Adjective
  Sense count: 4
   Sense 0
    Definition: Coarse; crude; unrefined or insensitive; lacking discrimination (in the third sense of the word)

When there are words like “coarse” (2 senses, 4 subsenses) or “insensitive” (35 synonyms, with no clarity as to the sense of the synonym), all you end up with is a vague mess.

A dictionary for human use will give a long list of wors that are synonyms (through one of their senses), but not tell you which sense is meant. – “insensitive” has “hard” as a synonym. The actual reference should be:

Synonym 18: Hard       Definition: (of a person) not showing any signs of weakness; tough (“tough” obviously needs clarification)

Part of what we are doing is cleaning up the definitions for words so that it is clear which sense of the component word is being used in the definition of the word – we end up with a precise web of words that can be searched – it should be more like what a human has, and less like a dictionary.

The Oxford Dictionary has domains in which a word operates – these were never thought out in detail, but they need to be.

Anti-Money Laundering legislation has civil law penalties and criminal law penalties. Civil law has “balance of probabilities”; criminal law has “beyond reasonable doubt” – they can’t both be true at once unless they are in separate logical shells - making the logical structure that describes them more complicated.

Figurative Allusion

Consider the allusion to raising the bar of a high jump frame:

Sense 0
           Definition: to raise the bar on a high jump frame.

Sense 1
    Definition: To raise standards or expectations, especially by creating something to a higher standard
    Example 0: Taiwan Semiconductor raised the bar on track widths at the conference (they raised the bar so far that no-one can follow  them).

Similar sentences:
        Fred raised (the bar on forever chemicals) at the meeting. – raising the subject of an abstract prohibition

       Fred raised the bar (of the barbell) to his chin, but could go no higher.

     The door was barred. Fred raised the bar and opened the door.

The person may have heard the bar referred to as a prohibition (one reason why synonyms need to be spotless), or not consider forever chemicals to be a suitable object on which “to raise the bar”. Raising the bar in the figurative sense is improving human activity, or a product of human activity. Forever chemicals are bad, so raising the bar on them does not make any sense. It is this apparently simple reasoning we need to automate.

Is there any need to rub it in - LLMs are completely unsuited to this sort of problem, where knowing the meanings of words is paramount.

A person has a very large store of things that they have heard, seen or read, and are willing to accept the meaning used by someone they respect or trust. A dictionary of a million words comes off as second best because of its flatness. We have built a structure which represents all the meanings of all the expected words (about 50,000). More reasoning about how words fit together is needed to match a human.

More  examples to follow.

 

 

Comments

Popular Posts