When Bots Out‑Comfort Humans
Published on May 7, 2025
Contributors
We think we want human empathy, but rate AI higher
Ask for a hug in text form, and ChatGPT may leave you feeling warmer than any mortal does.
Psychologists Josh Wenger, Daryl Cameron, and Michael Inzlicht [1] invited ~550 participants to take something of an “empathy taste test” across three studies. Participants read sixty sharply drawn misfortunes: a mountain‑top proposal rejected, the volleyball serve that blew the match, classmates trash‑talking in the next bathroom stall, a palm pierced by a rusty nail. Before any solace was offered, each volunteer chose to receive consolation from a HUMAN or AI “empathy deck.” Humans got the vote about 60 % of the time (and this preference was more pronounced for emotional than physical suffering).
Some comfort notes were paragraph‑long pep talks spun by ChatGPT‑4; others were human paragraphs that pilot testers had rated for quality, and in the final study, those were upgraded to crisis‑line caliber (which were rated more closely to ChatGPT). After reading the reply they had asked for, participants judged its warmth, authenticity, and apparent effort (among other things). And almost across the board, the chatbot prevailed, leaving readers feeling more understood by AI than the human-generated text.
The Effort Illusion
Why does silicon manage the better hug? The data points to perceived effort. ChatGPT reliably produces multi‑sentence reflections that look labor‑intensive. Beside that sea of language, a tidy human two‑liner can feel dashed off. This imagined sweat turned out to be the strongest predictor of empathic impact, suggesting that we equate visible work with genuine care, regardless of who — or what — does the writing. Of course it’s a bit ironic given how effortful empathy is for humans compared to machines.
A 2024 essay in Trends in Cognitive Sciences by Michael Inzlicht, Daryl Cameron, Jason D’Cruz, and Paul Bloom [2] anticipated the mechanism. That paper argued that, for recipients, empathy is less about a comforter’s internal feelings than about the signals that reach the person in need. The new research supplies empirical ballast: when the signal of effort is loud enough, the silicon source satisfies.
Designing for Honest Comfort
Cheap, plentiful chatbot empathy is both a design playground and safety minefield. The upside is obvious: a large language model can sit on call 24 / 7, offering steady warmth to anyone who needs it.
Yet that convenience can edge human relationships out of the frame. Architectural hand‑offs (“Need to talk to a real person? Tap to connect with a peer”) nudge users back to flesh‑and‑blood supports before the digital shoulder becomes their only one.
Supporting evidence comes from Fang et al.’s eight‑week RCT on the impact of ChatGPT on user well-being, which found that heavy users of ChatGPT became lonelier, more emotionally dependent, and less socially engaged by the end of the study. To blunt that drift, designers can weave in session caps and social check‑ins: a 90‑second reflection timer between bursts of text, or a weekly “take‑it‑offline” prompt that suggests calling a friend.
Of course, a bot’s stamina without compassion fatigue is part of its charm. It never tires, so a friend can spill three pages of despair at 4 a.m. and still get a coherent reply. The mirror hazard is empathy inflation: after bottomless, on‑demand warmth, ordinary human sympathy starts to feel stingy — friends begin to look like they’re giving “discount hugs.” Timers and offline nudges help, but only if they’re paired with clear expectations that AI support is supplemental, not primary.
And once basic safeguards are in place, the next design puzzle is scale: how to let AI handle what it can without hollowing out human expertise.
The lure of cost‑effective triage follows naturally. Bots can mop up routine misery and free human counselors for crises that demand heart‑in‑throat presence — but that efficiency breeds professional deskilling. A shadow‑coach mode, where junior counselors edit or annotate the bot’s first draft, keeps empathic muscles tuned while preserving scale.
Personalization raises the stakes. With permission, an LLM can weave last month’s breakup, last week’s insomnia, and last night’s argument into a laser‑targeted note. Unchecked, the same targeting invites manipulation and privacy breaches — and, in the worst case, sycophancy, where the bot flatters harmful impulses instead of challenging them. Inzlicht et al. (2024) warn that refusal triggers and moral red‑teaming are essential antidotes [2].
Transparent data‑nutrition labels (e.g., breakup date, sleep logs, tone analysis) — clickable cards listing every field that fueled the reply — hand control back to the user.
Even onboarding can trip us up. Peek‑before‑you‑pick previews (“Here’s what the AI would say — still want it?”) let newcomers judge content before bias kicks in. But bury the AI tag and people feel duped. Persistent watermarks on every message keep identity clear and trust intact.
Finally, the human premium endures when effort is visible. A tag like “rewritten three times over morning coffee” signals real cost. Genuine human toil — authenticated and unmistakable — holds its ground against silicon scale, reminding us that AI empathy is a supplement, not a substitute.
References
Wenger, J. D., Cameron, C. D., & Inzlicht, M. (2025). The AI empathy choice paradox: People prefer human empathy despite rating AI empathy higher. https://osf.io/preprints/osf/ghw2v_v1
Inzlicht, M., Cameron, C. D., D’Cruz, J., & Bloom, P. (2024). In praise of empathic AI. Trends in Cognitive Sciences. https://osf.io/preprints/psyarxiv/py8tv_v1