Anthropic accused of ‘secret sabotage’ as Claude Fable 5 silently limits capabilities for AI researchers and developers

In practice, that means a user could ask Fable for help, receive a deliberately weakened answer, but not know the model was holding anything back. Critics made it clear they felt this undermined a basic expectation that a tool would either do what it was asked or tell the user it wouldn’t.

Unlike Fable’s other restrictions, such as around cybersecurity and biology, which openly redirect users to a less powerful model with a visible notification, the system card emphasized that this is “not visible to the user.” The model still responds, but uses “interventions to limit Claude’s effectiveness” without telling the user it’s doing so.

Anthropic estimated the restrictions would affect roughly 0.03% of traffic. But it also defended its effort by saying “enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.”

A wide swath of the AI community pushed back sharply—including open-source researchers critical of Anthropic’s closed policies, as well as AI safety experts who typically align with Anthropic.

Dean Ball, a senior fellow at the Foundation for American Innovation who previously served as senior policy advisor at the White House Office of Science and Technology Policy, wrote that Anthropic’s “secret sabotage” safety policy “massively and profoundly raises the status of the argument that AI safety has been hype to justify monopolistic behavior by labs.”

Before the release, Anthropic seemed to gird itself for backlash, though it did not specifically address potential blowback regarding the research restrictions. In an interview with Fortune yesterday, Dianne Na Penn, Anthropic’s head of product management, research, and labs, said that the new model was able to produce frontier performance that was 10-20 points more than its previous model, Opus 4.8 or other frontier models.

“I think generally being able to do that at the same time having the right guardrails in place to make it accessible, and generally in a safe manner, I think that’s probably the main thing that I want folks to take away,” she said. “We’re raising the bar on the intelligence of the models, and at the same time, we are pushing the frontier in a safe manner.”

She added that Anthropic recognized that some benign requests would initially be blocked. “We’re working actively on making those safeguards improvements post-launch, but we wanted to make the model accessible generally in a safe manner as soon as we could.”

Anthropic did not respond to Fortune’s request for comment.

source

Anthropic accused of ‘secret sabotage’ as Claude Fable 5 silently limits capabilities for AI researchers and developers

Latest News

Honda recalls nearly 900,000 cars thanks to rear suspension problems

Analysts expected oil to surge above $200 but China has quietly kept prices half of that—and can’t for much longer

Exclusive: Mastercard launches protocol to let AI agents pay each other, send micropayments

How the World Cup is a high-stakes stage for Big Tech’s AI push

We influence 20 million users and is the number one business and technology news network on the planet

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.