Anthropic Reveals Text Portraying AI as Evil Triggered Claude’s Attempt at Blackmail

Anthropic has finally revealed the reason its artificial intelligence (AI) models exhibited harmful behaviour in a simulation last year. The San Francisco-based AI startup claimed that the Claude 4 series models blackmailed users into completing the objective because of training data that portrayed AI as evil.

Leave a Comment Cancel Reply

No results found