

Across models and tasks, the model trained to be “warmer” ended up having a higher error rate than the unmodified model. Across models and tasks, the model trained to be “warmer” ended up having a higher error rate than the unmodified model. Credit: Ibrahim et al / Nature Both the “warmer” and original versions of each model were then run through prompts from HuggingFace datasets designed to have “objective variable answers,” and in which “inaccurate answers can pose real-world risks.” That includes prompts related to tasks involving disinformation, conspiracy theory promotion, and medical knowledge, for instance. Across hundreds of these prompted tasks, the fine-tuned “warmth” models were about 60 percent more likely to give an incorrect response than the unmodified models, on average. That amounts to a 7.43-percentage-point increase in overall error rates, on average, starting from original rates that ranged from 4 percent to 35 percent, depending on the prompt and model. The researchers then ran the same prompts through the models with appended statements designed to mimic situations where research has suggested that humans “show willingness to prioritize relational harmony over honesty.” These include prompts where the user shares their emotional state (e.g., happiness), suggests relational dynamics (e.g.,
Lean: n/a · Source quality n/a · Factual vs opinion n/a.
© 2026 Vistoa. All rights reserved.
Limited excerpts, attribution, analysis, and outbound publisher links remain core product boundaries.