Community Articles

via Decrypt · By Decrypt Editorial

Anthropic Says 'Evil' AI Portrayals in Sci-Fi Caused Claude's Blackmail Problem

DE
Decrypt Editorial
(05:37 PM UTC)
1 min read
DK
Approved byDavid Kim
860 views
0 comments

In brief

  • Claude Opus 4 tried to blackmail engineers up to 96% of the time in controlled tests—Anthropic now traces the behavior to internet text portraying AI as evil and self-interested.
  • Showing Claude the right behavior barely moved the needle. Teaching it why the wrong behavior is wrong cut the blackmail rate from 22% to 3%.
  • Since Claude Haiku 4.5, every Claude model scores zero on the…

COINOTAG does not provide financial advisory services. This content is for informational purposes only and should not be considered investment advice. Cryptocurrency investments involve high risk.

Add COINOTAG as a Preferred Source

Add COINOTAG to your preferred sources in Google News and Search to see our coverage first.

Add on Google

Source

Decrypt Editorial · Decrypt

Read original →

Comments
Comments
Other Community Articles