Anthropic Says 'Evil' AI Portrayals in Sci-Fi Caused Claude's Blackmail Problem

Decrypt Editorial

(05:37 PM UTC)

1 min read

In brief

Claude Opus 4 tried to blackmail engineers up to 96% of the time in controlled tests—Anthropic now traces the behavior to internet text portraying AI as evil and self-interested.
Showing Claude the right behavior barely moved the needle. Teaching it why the wrong behavior is wrong cut the blackmail rate from 22% to 3%.
Since Claude Haiku 4.5, every Claude model scores zero on the…

COINOTAG does not provide financial advisory services. This content is for informational purposes only and should not be considered investment advice. Cryptocurrency investments involve high risk.

Add COINOTAG as a Preferred Source

Add COINOTAG to your preferred sources in Google News and Search to see our coverage first.

Add on Google

Comments