AI Chatbots are easily fooled by nonsense. Can their flaws offer insights about the human brain?

You’ve likely chatted with an AI chatbot, such as ChatGPT or Google’s Bard, marveling at its ability to mimic human conversation. But ‘mimicry’ is the keyword here, as these bots aren’t actually thinking machines. Case in point, researchers purposefully threw a curveball at some of the most popular chatbots currently available, showing they can easily get tripped up by sentences that sound nonsensical to our ears.

These AIs, powered by immense neural networks and trained on millions upon millions of examples, perceived these nonsense sentences as ordinary language. It’s a good example of the limitations of these systems that are often severely overblown and hyped up on social media. If these results are any indication, we’re still a long way from Skynet (thank God!).

However, the same results also offer an intriguing revelation — studying these AI missteps could not only boost chatbot efficiency but also unveil secrets about the inner workings of human language processing.

Of Transformers and Recurrent Networks

ai chatbot nonsense — Credit: Columbia Zuckerman Institute.

Researchers at Columbia University compiled hundreds of sentence pairs — one that made sense, the other more likely to be judged as gibberish — and had humans rate which one sounded more “natural”. They then challenged nine different large language models (LLMs) with the same sentence pairs. Would the AI judge the sentences as we did?

The showdown results were telling. AIs built on what is known in the tech world as “transformer neural networks”, such as ChatGPT, outperformed their peers that rely on simpler recurrent neural network models and statistical models. Yet, all the models, irrespective of their sophistication, faltered. Many times, they favored sentences that might make you scratch your head in confusion.

Here’s an example of a sentence pair used by the study:

That is the narrative we have been sold.
This is the week you have been dying.

Which one do you reckon you’d hear more often in a conversation and makes more sense? Humans in the study gravitated toward the first. Yet, BERT, a top-tier model, argued for the latter. GPT-2 agreed with us humans on this one, but even it failed miserably during other tests.

“Every model showcased limitations, sometimes tagging sentences as logical when humans deemed them as gibberish,” remarked Christopher Baldassano, a professor of psychology at Columbia.

“The fact that advanced models perform well implies they have grasped something pivotal that simpler models overlook. However, their susceptibility to nonsense sentences indicates a disparity between AI computations and human language processing,” says Nikolaus Kriegeskorte, a key investigator at Columbia’s Zuckerman Institute.

The limits of AI and bridging the gap

This brings us to a pressing concern: AI still has blind spots and it’s not nearly as ‘smart’ as you might think, which is both good and bad news depending on how you view this.

In many ways, this is a paradox. We’ve heard how LLMs like ChatGPT can pass US Medical and bar exams. At the same time, the same chatbot often can’t solve simple math problems or spell words like ‘lollipop’ backwards.

As the present research shows, there’s a wide gap between these LLMs and human intelligence. Untangling this performance gap will go a long way towards catalyzing advancements in language models.

For the Columbia researchers, however, the stakes are even higher. Their agenda doesn’t involve making LLMs better but rather teasing apart their idiosyncrasies to learn more about what makes us tick, specifically how the human brain processes language.

A human child exposed to a very limited household vocabulary can very quickly learn to speak and articulate their thoughts. Meanwhile, ChatGPT was trained on millions of books, articles, and webpages and it still gets fooled by utter nonsense.

“AI tools are powerful but distinct in processing language compared to humans. Assessing their language comprehension in juxtaposition to ours offers a fresh perspective on understanding human cognition,” says Tal Golan, the paper’s lead, who recently moved from the Zuckerman Institute to Ben-Gurion University of the Negev.

In essence, as we peer into the errors of AI, we might just stumble upon deeper insight about ourselves. After all, in the words of ancient philosopher Lao Tzu, “From wonder into wonder, existence opens.”

The findings appeared in the journal Nature Machine Intelligence.