Understanding Language
The major goal of
the Active Structure semantic architecture is to
learn—and we mean really learn—human
language. Current Deep Learning or LLM methods approximate language
understanding and usage through complex statistical models or word propinquity,
but there is no true language understanding behind them (and statistical models
or LLMs are useless in dynamic environments – tariffs, anyone?). This lack of true
understanding limits application and can lead to costly mistakes that can never
be fully eliminated because their cause lies at the very core of the Deep Learning
or LLM methods. Our own approach is grounded not in statistics, but in a real
understanding of the meaning of words (see footnote) and the way that they relate to one
another. But what goes into learning a language? How do humans go about it, and
can we implement the same method in a Semantic AI Machine (SAIM)?
We absorb
language like a sponge when we are young. Our mother tongue is not explicitly
learnt so much as unconsciously extracted. Learning a language when we’re older
is difficult—our brains are no longer tuned to pick it up as easily as a child
and we may need to be explicitly taught the rules. However, it bears mentioning
that an adult learning a new language can at least depend on the presence of
familiar grammatical elements and structures. The specific words and the
syntactic rules that govern their combination may differ, but the way that they
carve up the external world is familiar. Due to our common “hardware”, all
humans share similarities in the way that we perceive and organize the world.
Our language is grounded upon our shared relationship to the world. The
structure and content of it take for granted a common understanding of the
world.
You probably see
where I’m going with this by now: a machine lacks entirely the shared
perspective that humans use as scaffolding for interpreting the language of one
another. Teaching a SAIM a human language is fundamentally different than
teaching a human a new language, and a different approach will be necessary
because of this. Though perhaps a bit unconventional, an alternative starting
point we could use to approach this problem is to look at the methods humans
are using to try to understand the communications of other species. When I’m
not helping develop the SAIM approach, I’m a researcher specializing in animal
communication. I go out into the field following around monkeys with recording
equipment to capture any vocalizations they may produce, and then back at the
lab we try to tease apart what, if anything, these animals may be talking
about. We cannot assume anything during our investigations, as we lack a shared
perspective with the animals we study in the same way that a SAIM lacks the
human perspective.
One of the core
problems to solve in my work is the problem of reference. As an example, imagine
an anthropologist trying to learn the language of a newly encountered tribe of
people purely by observing the way that they use language. While out walking
with a member of this tribe, a rabbit hops by and the individual points at it
while exclaiming “gavogai”. From this episode, the anthropologist initially
concludes that gavogai means rabbit, but upon further reflection, he begins to
doubt this conclusion. Gavogai could mean animal (a higher-order
classification), it could mean furry (a quality), it could mean food (its
function), it could refer to the way that it is moving (an action), or it may
not even refer to the rabbit at all (“Look at that!”). In order to disentangle
the specific meaning of that word, we need to hear it uttered in many different
contexts, noting everything that is occurring at the time and testing different
hypotheses about the meaning of the word until a likely meaning can be
determined.
The
anthropologist can at least rely on humans from a totally different language
culture still carving up the world in a way that is largely understandable to
other humans. There will be familiar word categories like nouns, verbs, and
adjectives, and the things that those words refer to will probably be
recognizable as relevant features to a foreigner as well. This cannot be taken
as a given when we are trying to understand the meaning of animal vocalizations.
We have very little idea of how animals may carve up the world so we cannot say
ahead of time what they may or may not consider worthy of conversing about. We
cannot rely on a shared way of perceiving the world to guide our
interpretations.
Like the
anthropologist in the story, when we try to disentangle the meaning of animal vocalizations,
we form hypotheses about what certain sounds may mean and test those hypotheses
against the context that they occur in. When doing a field study on animal
communication, we track all context we can conceive of as being relevant—everything
from social interactions that have recently occurred to environmental features
and even lighting conditions at the time. We then use complex statistical
models to extract patterns and test hypotheses linking animal vocalizations to
the environmental features that they may refer to.
In some cases, the
answer has come easily—at least at first. Some of the early evidence for
referential communication in animals came from vervet monkeys, a terrestrial
primate species native to Africa. Early studies on the communication of the species
revealed that they produce distinct calls referring to specific predators. The
call that they use when they spot an eagle is different than the call that they
use when encountering a snake or a leopard. Not only that, but the other
monkeys in the group will respond in a distinct and situationally-appropriate
way depending on which call they heard, showing that other monkeys understand
the referent of the call. Though alarm calls in animals were known by humans
long before this, it was assumed that they had a more general referent
(“something threatening is present”) or were produced in response to an
internal state (“I am frightened”). What the study of these alarm calls
accomplished was to demonstrate that animals too may deploy specific
vocalizations for the purpose of informing others of things in their
environment. In short, it demonstrated communicative intent and referential
speech, not unlike how humans will converse with one another using words that
reference specific features of their environment.
From that
starting point, research on animal communication has exploded into every
species from ravens to dolphins, with evidence of referential communication
appearing in many unlikely places. Prairie dogs have a predator alarm call
system so complex they can communicate the color of shirt that a human is
wearing. Chimpanzees communicate with each other about the quality of newly
discovered food. Elephants even address each other with specific calls—in other
words, they may have names for one another. The task of disentangling animal
communication is, however, far from over. The best evidence for communication
still comes from simple alarm systems, and in most cases we’re far from
determining what—if anything—animals may be talking about.
At this point, we
can confidently say that at least some animals have the capacity to communicate
about things relevant to their day-to-day lives like predators and food, but
how flexible is this system? Suppose the environment changes very rapidly and
the need to communicate something entirely new arises. What allows human
language to adapt to express new ideas is the fact that we combine small chunks
of meaning into larger ones. Morphemes are combined to form new words with new
meanings, and words are combined into sentences to form complex meanings. This
ability to combine smaller chunks of meaning into infinite unique combinations
is what grants human communication its flexibility.
Though still a
fresh avenue of study, early research into animal combinatory vocalization
usage has demonstrated that it is widely present across the animal kingdom. The
presence of combinatory usage on its own does not, however, prove that these
combined calls have semantic content beyond the basic units composing them. So
far most evidence for semantic content has come from primates, with a variety
of monkey species combining vocalizations in a fashion that resembles
affixation to convey specific meanings in alarm call contexts. One species of
bird, the Japanese great tit, has also been shown to encode semantic content
using syntactic rules, such that a call sequence of ABC-D means something
different than D-ABC.
Perhaps
unsurprisingly, the best evidence for semantic content in vocal combination
usage comes from our closest relatives, the chimpanzees. They have been
observed to produce a wide variety of vocal combinations specific to particular
contexts, suggesting these combinations convey semantic content relating to
those contexts. Not only that, but the order in which they combine their calls
changes the meaning they are conveying, demonstrating the use of syntactic
rules in chimpanzees. The rules governing appropriate call order have been
shown to vary between populations, suggesting that there exist different
chimpanzee “dialects”. This final point is especially pertinent to our
discussion of the capacity for animal communication to convey new
information—it demonstrates that the sequences chimpanzees produce are learnt,
not purely instinctual, meaning that the way that they combine vocalizations is
flexible and can potentially accommodate new meanings. Further support for this
point comes from chimpanzees that have been taught to communicate with humans:
Washoe the chimpanzee famously used ASL (American Sign Language) to sign “water
bird” upon first encountering a swan, demonstrating the ability to flexibly combine
previously learnt words to refer to new things.
So what’s all
this got to do with teaching language to a SAIM? There are a few lessons that
can be gleaned from our discussion. For one, the capacity to make flexible use
of language through combining previously learnt words in new ways is critical.
The world and language itself is constantly (and sometimes rapidly) changing,
and any intelligence seeking to communicate about a changing world needs to
have a sufficient grasp of language to combine its components in a flexible
way.
The most
important takeaway here is that though the referent of a word and its intended
significance may be readily evident to us, the SAIM has none of the background
understanding that we are relying on in our interpretation. Like the researcher
trying to decode monkey communication, the SAIM cannot assume anything at the
outset. On its own, it would need to weigh all possibilities when trying to
make sense of human language and observe the word being used in many different
contexts before it could settle on its meaning. We can aid this process by
making every aspect of our language as explicit as possible.
So much of our
interpretive gear is housed within the unconscious mind. The SAIM does not
share our perspective and can depend on none of the scaffolding we rely on to
interpret language. To ease the journey of the language-learning Semantic AI
Machine as much as possible, the unconscious needs to be made conscious and the
assumed needs to be made explicit. This step is also structurally necessary for
the SAIM. Unlike the human mind, it will have no unconscious layer in a way
comparable to that of humans. The capacity and processing power of the SAIM can
potentially be increased far beyond human limits, which removes the need for conscious-unconscious
segmentation. There will certainly be layering, and processes running in the
background that rarely surface, but there will need to be the capacity for
elaboration or repair at every point (and archiving of what has been changed).
This necessitates transparency and surface accessibility, which will be another
key difference between the architecture of the SAIM and the human mind.
The analogy between
a machine learning human language and a human decoding animal communication underscores
a fundamental asymmetry between the researcher and the subject. When humans
study animal communication, they do so from a position of cognitive advantage,
equipped with tools for abstraction, hypothesis testing, and contextual
reasoning. For a machine to truly grasp human language in a comparable way, it may
be necessary for it to operate from a similarly elevated position. This means
not merely imitating human language use, but understanding it deeply enough to
analyze, generalize, and even repair it. In other words, while the SAIM begins
as an outsider to human perspective, its architecture must ultimately support
capabilities that allow it to stand above
or at least on a level with human
understanding. This requires an artificial mind with the clarity and
transparency to reconstruct meaning from the ground up.
(when we say “meaning”,
we mean an active structure is created, which emulates what the real-world structure
would do – if a verb indicates an action – “he ran away”- that action is
carried out (abstractly). If the verb or adjective indicates a change of state,
the object changes its state – “he became angry”. The text becomes a working
model, making it much easier to understand complex behaviour beyond a human’s ability
to understand from text – too many moving parts)
by Ryan
Sigmundson
Comments
Post a Comment