How Far Will Generative AI Take You Towards AGI?
It seems like a side road. At its heart, it is an indexing
method for articles on the internet. Someone writes a fairly short,
non-technical article on a subject. You type in a few words that point to that
article, it returns the article.
First problem – popularity. If there are several or many matches,
it will return the most popular article, not the most valid one – it has no
idea about validity. This means that new, more valid ideas will struggle to
break through – a return to the Dark Ages, where nothing changes. A good
example was the idea that bacteria can cause ulcers. Doctors were strongly
resistant to the notion – if they are never shown the notion, because it is
less popular than the current view, it will never be accepted.
Second problem – the prompt may not be obvious, and it
requires the skills of a “prompt engineer” to craft a suitable prompt. In a
fast-moving technical field, how long will the prompt be valid. If the more
esoteric topics need prompts, does this invalidate the hope of having a huge
database of valuable information at your fingertips? Having someone change the
question from the one you asked to one that they think you want has obvious
pitfalls.
Third problem – “a picture is worth a thousand words”. If
LLMa are restricted to text (which it doesn’t understand), how can the text be
usefully linked to an explanatory image? And, given that words can have many
meanings, how do we annotate the image in a way that is not confusing? Pointing
to an area on the wing of an aircraft and saying “flap” leaves no doubt, but
pointing to a picture of a metal box and saying “container” is not adequate if
we mean an ISO container, and the box picture has not brought context with it.
This is where the Unconscious Mind brings 20-40 years of observing objects to
the fore, filling in details so a mountain isn’t left hanging in the air as a
triangle because we didn’t show its base or its three-dimensionality.
AGI will require the specific meaning of words, and the ability to seamlessly integrate text, images and video, in the same way that our memory can dig up a small video of a memorable event in our past. An LLM-based solution doesn’t come within a bull’s roar of what is needed.
Comments
Post a Comment