Dictionary Domains
The Oxford English Dictionary uses domains, but does not
give them definitions – the reader is supposed to work them out (not easy).
Description: law
Category: Noun
Feature count: 1
Feature 0: Subcategorization: AggregateForm
Sense count: 4
Sense 0
Definition: the system of rules which
a particular country or community recognizes as regulating the actions of its
members and which it may enforce by the imposition of penalties
Domain count: 1
Domain 0: Law
In this case, the definition of the domain is the same as
one of the definitions of the word.
If we make the connection explicit, it would give more
meaning to the objects in the domain, such as:
Description:
abscond
Category: Verb
Feature count: 1 Feature 0: Subcategorization: Intransitive
Sense count: 1
Sense 0
Definition: leave hurriedly and
secretly, typically to avoid detection of or arrest for an unlawful action such
as theft
Domain count: 1
Domain 0: Law
We moved from the OED to Wiktionary, impressed by the number
of words it handles, and the fact that the processing of a dictionary lookup
could be local, so many unknown words could be integrated into the semantic network
on the run (parsing text containing unknown words).
Wiktionary does not use the notion of domains, and it also
does something very bad:
Description: law
Category: Noun
Feature count: 3
Feature 0: Number: Singular XXX
Feature 1: Mass: Countable XXX
Feature 2: Mass: Uncountable XXX
This effectively jumbles up “law” used to describe all of
law and “law” describing a law. We are going to need to unjumble this so the
distinction isn’t lost (the unjumbling is worth it if it allows local
processing of unknown words. Thousands of single laws exists in the domain of
all of law, many of them needing connection to a library of other laws (the
Crimes Act, for example).
Sense count: 13
Sense 0
Definition: The body of binding rules
and regulations, customs, and standards established in a community by its
legislative and judicial authorities
The definition of law as a domain
Sub Sense
count: 2
Sub Sense 0
Definition: The body of such rules
that pertain to a particular topic
Example count: 2
Example 0: property law
A subdomain of law
Sense 1
This is a repeat of Sense 0
There is a great deal of logic embedded in English to hold it all together - logic that the user is mostly unaware of, it being handled by their Unconscious Mind. We have to implement all that logic explicitly, so the machine’s understanding mirrors that of the human (even though they are not consciously aware of the complexity, and will often deny that it even exists, never having had to think about it consciously).
The difference between a senabtic approach and an LLM is highlighted by this logic, the LLM not understanding a word, let alone the logic joining the words together.
Comments
Post a Comment