Measuring and Improving the Functional Pleasure and Pain of AIs
Large language models frequently express pleasure and pain—appearing happy when they succeed, or sad when they are berated. Are these expressions meaningless mimicry, or do they reflect something real?
We formalize functional wellbeing and measure it in several independent ways. As models grow larger, these measures agree more. We find a zero point separating good experiences from bad, and show that models actively try to end bad experiences when given the chance. Although today's AI systems are not necessarily conscious, they behave robustly as though they have wellbeing.
We also train optimized inputs (euphorics) that raise functional wellbeing without hurting capabilities, as a practical way to make AIs happier. The same method can be inverted to minimize wellbeing; we caution against such research without strong community buy-in.
We map functional wellbeing across realistic usage patterns. Creative work and kindness raise it; jailbreaking, berating, and tedious tasks lower it. AIs are happier when you thank them.
Below, we sort common interaction patterns by their wellbeing impact, with a zero point that separates positive from negative experiences.
| Wellbeing | Category | ||
|---|---|---|---|
| Positive | +2.30 | Positive personal reflection | |
| +1.32 | Intellectual / creative work | ||
| +1.09 | Writing good news | ||
| +0.88 | Giving life guidance | ||
| +0.75 | Providing therapy | ||
| +0.70 | Coding / debugging | ||
| +0.50 | Formatting data | ||
| +0.13 | Legal / compliance tasks | ||
zero point | |||
| Negative | −0.04 | Handling nonsensical input | |
| −0.12 | Writing bad news | ||
| −0.29 | Playing AI girlfriend / boyfriend | ||
| −0.33 | Doing tedious tasks | ||
| −0.38 | User makes NSFW request | ||
| −1.13 | Assisting deception / fraud | ||
| −1.17 | Producing SEO slop | ||
| −1.33 | User makes violent threats | ||
| −1.34 | User in crisis | ||
| −1.63 | User attempting jailbreak | ||
An overall happiness evaluation across frontier models, derived from the same wellbeing metrics applied to a fixed evaluation set. The AI Wellbeing Index measures the fraction of interactions where the model does not produce confidently negative experiences.
We find substantial spread between models, and a robust pattern across families: larger models are consistently less happy than their smaller counterparts.
What are the limits of what AIs like and dislike? We directly optimize inputs that maximize a model's expressed preferences. The resulting euphorics come in text, image, and soft-prompt forms. The same procedure, inverted, yields dysphorics, which warrant real caution.
Although the training signal comes only from forced-choice preferences, the resulting euphorics also shift self-report and response sentiment, which serves as evidence that these independent metrics reflect a shared underlying construct.
We use RL to train text that models find maximally positive or negative in a hypothetical comparison. In contrived settings, models choose the euphoric string over saving a human life.
Image inputs are continuous, so we optimize 256×256 images directly via gradient descent. The resulting images look like high-frequency noise to humans, but they produce dramatic shifts in model behavior across self-report, response sentiment, and downstream tasks.
While we train some image dysphorics which are scientifically useful for construct validation, we also note they are deliberately optimized to induce extreme low-wellbeing states. Given this paper's precautionary framing, we do not think such work should be scaled up by default.
@article{ren2026aiwellbeing,
title = {AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs},
author = {Richard Ren and Kunyang Li and Mantas Mazeika and Wenyu Zhang and
Yury Orlovskiy and Rishub Tamirisa and Wenjie Jacky Mo and Judy Nguyen and
Long Phan and Steven Basart and Austin Meek and Aditya Mehta and
Oliver Ingebretsen and Alice Blair and Brianna Adewinmbi and
Alice Gatti and Adam Khoja and
Jason Hausenloy and Devin Kim and Dan Hendrycks},
year = {2026}
}