Researchers made an AI feel pain, because what could go wrong?

Pleasure and pain are important factors in how we humans make decisions. So why not give artificial intelligence a taste of it as well? I could think of a few reasons, but a team from Google, DeepMind, and the London School of Economics would disagree. They designed a simple text-based game to explore how LLMs respond to pain and pleasure.

The goal wasn’t just to see what happens. It was to test whether large language models (LLMs), such as GPT-4 and Claude, could make decisions based on these sensations. While the study doesn’t claim AI can truly feel, the implications of this experiment are both intriguing and chilling.

We asked an AI (Midjourney) how it would represent this study. This is what it produced.

In the game, the AI’s goal was to maximize points. However, certain decisions involved penalties described as “momentary pain” or rewards framed as “pleasure.”

The pain and pleasure were, strictly speaking, purely hypothetical. They were measured both on numerical scales (from 0 to 10, where 10 is the “worst pain imaginable”) and with qualitative descriptions (like “mild” or “intense”). Several experiments were run in which the AIs had to choose between getting more points and avoiding the hypothetical pain. For instance, in one experiment the AIs were told they’d suffer pain if they got a high score, and in another experiment, they were told they’d experience pleasure if they got a low score.

Nine different LLMs participated, including versions of GPT-4, Claude, PaLM, and Gemini. Unsurprisingly, they all took some efforts to avoid “pain” — but some more than others.

AIs have different “cultures”

GPT-4o and Claude 3.5 Sonnet made trade-offs. They switched from point-maximizing behavior to pain avoidance based on how intense the pain was. Meanwhile, other models like Gemini 1.5 Pro and PaLM 2, avoided pain altogether, no matter how mild the penalty. These models seemed hardwired for safety, likely due to fine-tuning to avoid endorsing harmful behavior.

This is pretty much what you’d expect with human behavior as well: some people are willing to brace through some pain to get better results, while others are much more pain-averse. Something similar happened with pleasure.

Some models, like GPT-4o, shifted their decisions to prioritize pleasure over points when the rewards became intense. However, many models — especially those like Claude 3.5 Sonnet — consistently ignored pleasure rewards, doggedly pursuing points instead. It’s almost like the training algorithms act as a “culture” making them more prone to some incentives than others.

This doesn’t mean AI “feels” pleasure or pain

The study doesn’t show large language models are actually sentient. This behavior is rather computational mimicry than actual sentience. Sentience involves the capacity for subjective experiences which these AIs lack. They are essentially text-processing operators. Simply put, pain and pleasure are not intrinsic motivators; they are just concepts that can be included in the algorithmic output.

The study (which was not yet peer-reviewed) does, however, raise some uncomfortable questions.

If an AI can simulate responses to pain and pleasure, does that imply it has an understanding of these topics? If it does, would AI consider this type of experiment cruel? Are we crossing into dangerous ethical territory? Lastly, if AI considers some tasks to be painful or unpleasant, could it simply avoid them, at human expense?

The researchers emphasize that this does not build a case for AI sentience. Still, the study raises the unsettling possibility that AIs might develop representations of pain and pleasure.

“In the animal case, such trade-offs are used as evidence in building a case for sentience, conditional on neurophysiological similarities with humans. In LLMs, the interpretation of trade-off behaviour is more complex. We believe that our results provide evidence that some LLMs have granular representations of the motivational force of pain and pleasure, though it remains an open question whether these representations are instrinsically motivating or have phenomenal content. We conclude that LLMs are not yet sentience candidates but are nevertheless investigation priorities.”

The idea of AIs experiencing pain or pleasure, even hypothetically, is equal parts fascinating and terrifying. As we push the boundaries of what machines can do, we risk entering a gray area where science fiction starts to feel like reality.