How the human brain differs from deep learning approaches in AI
How the human brain differs from
deep learning approaches
There is an emerging view of the human brain as an engine of probabilistic prediction. Statistically-driven perspectives of the mind such as predictive processing and the Bayesian brain view have been gaining prominence in the last decade. Incoming data is interpreted through the lens of expectations we model the world around. We extract patterns from the world around us and use them to update the models we will use in future interpretations. When framed like this, there are surface similarities to some of the techniques used in recent AI applications. The deep learning (DL) approaches currently dominating the field of AI are in essence large-scale probability-driven algorithms, adept at finding patterns in large data sets. Given this surface similarity to DL training methods, is the human brain really so different? The simple answer is “yes”, but the “why” behind that requires some unpacking.
Broadly
speaking, machine learning technologies recognize patterns and based on past
observed data predict future outcomes. Deep learning is a subset of machine
learning that relies on artificial neural networks (ANNs) to encode patterns in
data. ANNs are networks of nodes divided into connected layers. Signals travel
between nodes and assign weights to those nodes. More heavily weighted nodes
exert more influence on the next layer of nodes. The final layer compiles contributing
node weights to produce an output. The ANN “learns” by comparing its output to
a success criterion (such as human-labelled images in the case of image
recognition tech), marking the error between its own output and the target
criterion, and making adjustments to its weighting to try to reduce this error
with each successive iteration. This process continues with the machine
gradually coming to produce output similar to the target criterion. Information
is fed to the machines in the form of data sets, which the ANNs then learn to
classify through this process.
The ANNs
themselves are loosely based on a simplified view of the way neurons in the
brain code information. It’s worth emphasizing that this view is not only
heavily-simplified, but also heavily-outdated. The ANN approach was conceived
of and developed many decades ago, over which time neuroscience has progressed
considerably while the same ANN principles have gone on to inform future AI
technologies. While it is true that the summation of neural inputs contributes
to the activation of connected neurons (not unlike the basic idea behind ANNs),
the similarities between the two systems do not extend much beyond that. In
ANNs, data is represented in a centralized rather than distributed form, data
is processed sequentially rather than in parallel, and the inbuilt feedback
systems present in the brain have no equivalent in ANNs. Furthermore, the brain
may actively modify its architecture, producing entirely new neural connections
(or eliminate old ones), but ANNs cannot go beyond adjusting their signal
weightings. Their architecture itself is fixed from the beginning, which places
greater limits on the restructuring that can potentially occur. What this all adds
up to is a much less versatile machine than the brain.
To take on a
specific example of an implementation of the DL approach, let’s focus our lens
on large language models (LLMs). These models are essentially probability maps
that model the likelihood that a given word will follow another given word, not
entirely dissimilar to the autocomplete function you find on any smartphone. A
large corpus of text is fed to the model (the training set), and when given a
prompt the model will produce the words mostly likely to follow that sequence.
For LLMs to move beyond being only high-powered autocomplete functions, they
need to be implemented in a system that will frame these prompts in a
particular way. Chatbot implementations of LLMs, for example, may present the
prompts to the model within the framework of a conversation in which their role
is to provide a direct and useful response, with the result that the model will
generate words in the form of a reply to the prompt. The reason it does not
produce gibberish or even something ungrammatical is because ungrammatical
nonsense is somewhat less likely to occur in the corpus that it has been
trained on than more sensible responses, the result being that the machine more
often than not gives the impression of competence.
Disregarding
any appearance of interacting with an intelligent agent, it cannot be
emphasized enough that the model has no understanding of anything that it is producing.
If one were to ask it how plants produce energy from sunlight and the machine
were to reply with an explanation of photosynthesis, it is because in the
context of a conversation in which one actor asks another how plants produce
energy, the most likely way that their conversation partner would reply is to
give an explanation of photosynthesis. An offshoot of its complete lack of
understanding of what it is producing is that it has no means of measuring
truth from falsehood. The only metric that the model itself uses when crafting
its reply is the likelihood that words will appear paired together in its
corpus and that gives no measure of the truth of any given statement. The
machine makes no distinction between Donald Duck and Donald Trump in terms of
truth. Reality holds no special privilege over total fabrication, which accounts
for the reputation of chatbots as occasional generators of misinformation. But
even when it is outright fabricating information, it does so with the same
surface appearance of competence it would use for anything else. This results
in a situation where one needs to fact check its replies, which rather defeats the
entire point.
A further
deficit of LLMs is that they lack communicative intent. A conversation with another
individual occurs in a context in which the speaker knows that they will be
heard by another and their words may go on to affect the beliefs and behavior
of the listener. This has an effect on the way that we communicate information.
Furthermore, when we are interacting with another individual, especially in the
case of a known individual, we may bring to the interaction beliefs about the
intent and knowledge state of our interaction partner. We may have some ideas
about what they already know, so we don’t need to reiterate things that we can
take as a given. We may have some idea about what their motive is in asking a
question (perhaps they’re only interested in a specific aspect of the answer)
which allows us to leave out irrelevant information. In short—we are able to
communicate relevant information more efficiently and effectively because of
outside knowledge that we bring to the conversation and apply with
communicative intent. This is another feature that LLMs totally lack, which
inhibits their effectiveness as a conversation partner.
A further
distinction between the human mind and DL-based AI is that we have the capacity
to organize the information we have extracted into modifiable concepts that can
be reflected upon. The meaning of words can be represented abstractly and
generalized to new situations. Statistical regularities extracted from the
environment go on to form higher-level concepts (such as beliefs) that exert a
top-down influence on our behavioral output. These higher-level concepts may be
accessible for self-reflection and deliberate modification. This results in a
feedback loop between a reflective agent, its own internal workings, and its
environment. Our ability to actively reflect upon and modify aspects of our own
mental architecture is a key feature that strongly distinguishes us from pure
applications of the DL approach.
Knowledge
transfer, cross-domain reasoning, and flexible problem solving are abilities
where the human mind far outshines anything DL could produce. Training in DL
amounts to building a predictive model out of a data set well-suited to solving
a particular problem. The architecture that emerges from this training may
perform well enough when solving the task it has been trained for, but this
architecture has no means of transferring its knowledge beyond the bounds of that
domain. This may be sufficient when the developed architecture is being used as
a tool for a closed-environment task, but it can be a problem when these
architectures are applied (or misapplied, depending who you ask) to real-world
environments. The training data set cannot encompass all possible exceptions
that may be encountered, and without flexible problem solving abilities, the
developed architecture cannot effectively react to unforeseen occurrences (try
letting your search engine autocomplete “self-driving car crashes into…” and
take your pick to see some examples). This is more than just a problem to be
overcome; it lies at the very core of the deep learning approach and cannot be improved on by further
extensions of the same logic. There may still be a future for the DL approach
as a single component of a more multi-faceted AI architecture (possibly as a
module for statistical learning), but the industry’s current overreliance on
this approach is misguided and overly optimistic. The technology is already
approaching its limit.
Let’s look
at a potential AI application – removing the second pilot on commercial airline
flights.
We load up
the system with instruction manuals, wiring diagrams, short courses on aerodynamics
and electronics. Fault behaviour is very unusual, so we are unable to cover the
field with many examples in any training set we could provide.
During a
flight, the aircraft begins to manifest a fault. There is nothing to account
for the issue in the manual – what happens now?
We ask the
system to hypothesize as to the cause of the fault. It begins to search around
the general location of the fault, looking for out of spec operation. It tracks
the fault down to a specific module, asks for and receives approval to take it
offline.
How did it
accomplish this? It used the meanings of words, and the realization of those
words in the architecture of different systems - it could not just regurgitate
a string of words that already existed – it used AGI. The DL approach cannot
produce outcomes like this because the problem is unique and not represented with
high enough frequency in any training set that could be provided to the model. DL
may have use as an updated ELIZA, where its users are not critical of its
performance, but deploying these technologies in any serious applications is
outright irresponsible.
Comments
Post a Comment