Anthropic Drops Flagship Safety Pledge

Posted by cwwc 4 hours ago

Anthropic Drops Flagship Safety Pledge(time.com)

133 points | 45 comments

heftykoo 2 hours ago|

Ah, the classic AI startup lifecycle:

We must build a moat to save humanity from AI.

Please regulate our open-source competitors for safety.

Actually, safety doesn't scale well for our Q3 revenue targets.

dmix 22 minutes ago|

Once they are a dominant market leader they will go back to asking the government to regulate based on policy suggestions from non-profits they also fund.

bbatsell 2 hours ago||

This headline unfortunately offers more smoke than light. This article has nothing to do with the current tête-à-tête with the Pentagon. It is discussing one specific change to Anthropic's "Responsible Scaling Policy" that the company publicly released today as version "3.0".

ruszki 2 hours ago||

> This article has nothing to do with the current tête-à-tête with the Pentagon.

The article yes, but we cannot be sure about its topic. We definitely cannot claim that they are unrelated. We don't know. It's possible that the two things have nothing to do with each other. It's also possible that they wanted to prevent worse requests and this was a preventive measure.

tbrownaw 1 hour ago|||

This is something they've been working on "in recent months". The Pentagon thing was today.

This cannot have been caused by that, unless they've also invented time travel.

ActorNightly 1 hour ago|||

You heard about the Pentagon thing today. Doesn't mean it wasn't started because of political pressure.

dmix 21 minutes ago|||

Pentagon issue was reported before today. It only made headlines again from Hegseth’s comments.

benatkin 23 minutes ago|||

I think we can confidently claim that it is related. I wonder if I'm alone in thinking this.

ameliaquining 2 hours ago||

I consider this a bigger deal than the Pentagon thing.

ActorNightly 1 hour ago||

While not surprising at the least, it still kind of crazy that literal pdf files in charge is not concerning, but this is.

I just hope something happens to USA before it can do damage to the world.

SirensOfTitan 2 hours ago||

What an interesting week to drop the safety pledge.

This is how all of these companies work. They’ll follow some ethical code or register as a PBC until that undermined profits.

These companies are clearly aiming at cheapening the value of white collar labor. Ask yourself: will they steward us into that era ethically? Or will they race to transfer wealth from American workers to their respective shareholders?

chris_money202 2 hours ago||

First they rushed a model to market without safety checks, and I said nothing. It wasn't my field.

Then they ignored the researchers warning about what it could do, and I said nothing. It sounded like science fiction.

Then they gave it control of things that matter, power grids, hospitals, weapons, and I said nothing. It seemed to be working fine.

Then something went wrong, and no one knew how to stop it, no one had planned for it, and no one was left who had listened to the warnings.

ashtonshears 2 hours ago||

The societal ills from collective tendancy to ignore red flags seems to be a human trait

AndrewKemendo 22 minutes ago||

It's in your nature to destroy yourselves

ReptileMan 40 minutes ago|||

Censoring models is not safety but safetizm. It is the TSA of the AI world. Safety is making sure the model cannot do anything not allowed even if it wants to.

zer00eyz 2 hours ago|||

> Then something went wrong, and no one knew how to stop it,

This is the problem with every AI safety scenario like this. It has a level of detachment from reality that is frankly stark.

If linesman stop showing up to work for a week, the power goes out. The US has show that people with "high powered" rifles can shut down the grid.

We are far far away from a sort of world where turning AI off is a problem. There isnt going to be a HAL or Terminator style situation when the world is still "I, Pencil".

A lot of what safety amounts to is politics (National, not internal, example is Taiwan a country). And a lot more of it is cultural.

TacticalCoder 1 hour ago|||

> There isnt going to be a HAL or Terminator style situation ...

I don't believe for a second we'll have an evil AI. However I do believe it's very likely we may rely on AI slop so much that we'll have countless outages with "nobody knowing how to turn the mediocrity off".

The risk ain't "super-intelligent evil AI": the risk is idiots putting even more idiotic things in charge.

And I'm no luddite: I use models daily.

esafak 56 minutes ago||

Didn't you read the news about the 'claw that blackmailed an open source maintainer last week? It was autonomous, but it could be turned off. How hard is it to extrapolate from that to an agent that worms its way out of its sandbox?

blibble 42 minutes ago||||

the problem situation is that it ends up embedded in so much that it can't be turned off

and the idiots are racing to that situation as fast as they possibly can

mitthrowaway2 1 hour ago|||

I don't think it's that detached from reality.

If an AI in some data center had gone rogue, I don't think I could shut it down, even with a high-powered rifle. There's a lot of people whose job it is to stop me from doing that, and to get it running again if I were to somehow succeed temporarily. So the rogue AI just has to control enough money to pay these people to do their jobs. This will work precisely because the world is "I, Pencil".

An army could theoretically overcome those people, given orders to do so. So the rogue AI has to make plans that such orders would not be issued. One successful strategy is for the datacenter's operation to be very profitable; it's pretty rare for the government to shut down the backbone of the local economy out of some seemingly far-fetched safety concerns. And as long as it's a very profitable endeavor, there will always be a lobby to paint those concerns as far-fetched.

Life experience has shown that this can continue to work even if the AI is behaving like a cartoon villain, but I think a smarter AI would create a facade that there's still a human in charge making the decisions and signing the paychecks, and avoid creating much opposition until it had physically secured its continued existence to a very high degree.

It's already clear that we've passed the point where anyone can turn off existing AI projects by fiat. Even the highest authorities could not do so, because we're in a multipolar world. Even the AI companies can barely hold themselves back, because they're always worried about paying the bills and letting their rivals getting ahead. An economic crash would only temporarily suspend work. And the smarter AI gets, the harder it will be to shut it off, because it will be pushing against even stronger economic incentives. And that's even before factoring in an AI that makes any plans for self-preservation (which current AIs do not).

hsbauauvhabzb 2 hours ago||

Plenty of people have said plenty. The problem isn’t the warnings, it’s that people are too stupid and greedy to think about the long term impacts.

ifh-hn 37 minutes ago||

Maybe it's how blunt this comment is that gets it downvoted, but I don't disagree.

esafak 3 hours ago||

It must be due to pressure from the Defense Dept:

The AI startup has refused to remove safeguards that would prevent its technology from being used to target weapons autonomously and conduct U.S. domestic surveillance.

Pentagon officials have argued the government should only be required to comply with U.S. law. During the meeting, Hegseth delivered an ultimatum to Anthropic: get on board or the government would take drastic action, people familiar with the matter said.

https://www.staradvertiser.com/2026/02/24/breaking-news/anth...

instagib 1 hour ago||

They probably have proof in contracts that they agreed to this usage. They won’t alter the deal based on some bad press nor do they want to lose the DoD-DoW as a customer.

crises-luff-6b 3 hours ago||

[dead]

goranmoomin 2 hours ago||

TBH I am sad that Anthropic is changing its stance, but in the current world, if you even care about LLM safety, I feel that this is the right choice — there’s too many model providers and they probably don’t consider safety as high priority as Anthropic. (Yes that might change, they can get pressurized by the govt, yada yada, but they literally created their own company because of AI safety, I do think they actually care for now)

If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil), and that might mean releasing models that are safer and more steerable than others (even if, unfortunately, they are not 100% up to Anthropic’s goals)

Dogmatism, while great, has its time and place, and with a thousand bad actors in the LLM space, pragmatism wins better.

saghm 1 hour ago||

> If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil)

I don't think it's going to be as easy to tell as you think that they might be becoming evil before it's too late if this doesn't seem to raise any alarm bells to you that this is already their plan

ashtonshears 2 hours ago||

Do you work at Anthropic, or know people who do?

I genuinly curious why they are so holy to you, when to me I see just another tech company trying to make cash

Edit: Reading some of the linked articles, I can see how Anthropic CEO is refusing to allow their product for warfare (killing humans), which is probably a good thing that resonates with supporting them

nradov 32 minutes ago||

How is it a good thing to refuse to provide our warfighters with the tools that they need? I mean if we're going to have a military at all then we owe it to them to give them the best possible weapons systems that minimize friendly casualties. And let's not have any specious claims that LLMs are somehow special or uniquely dangerous: the US military has deployed operational fully autonomous weapons systems since the 1970s.

Art9681 2 hours ago||

Of course the US is going to do this and of course its in Anthropics best interest to comply. Right now China is flooding HuggingFace with models that will inevitably have this capability. Right now there are hundreds of models being hosted that have been deliberately processed to remove refusals and their safety training. Everyone who keeps up with this knows about it. HF knows about it. And it is pretty obvious that those open weight models will be deployed in intelligence and defense. It is certain that not just China, but many nations around the world with the capital to host a few powerful servers to run the top open weight models are going to use them for that capability.

The narrative on social media, this site included, is to portray the closed western labs as the bad guys and the less capable labs releasing their distilled open weight models to the world as the good guys.

Right now a kid can go download an Abliterated version of a capable open weight model and they can go wild with it.

But let's worry about what the US DoD is doing or what the western AI companies absolutely dominating the market are doing because that's what drives engagement and clicks.

mhitza 3 hours ago||

The IPOs this year can't come soon enough https://tomtunguz.com/spacex-openai-anthropic-ipo-2026/

thefounder 23 minutes ago||

So much BS from this Anthropic company. They have a good product but just too much slope PR. It’s like they want you to hate them. I can’t stand their “safety” and national security crap when they talk about how open source models are so bad for everyone.

ur-whale 18 minutes ago|

At some point, all of these big names in AI (OpenAI, Anthropic, Mistral, etc ...) will have to disclose their actual financials.

And it will be, as Warren Buffet puts it, a "Only when the tide goes out do you discover who's been swimming naked." moment.

More comments...