System Card: Claude Mythos Preview [pdf]

Posted by be7a 3 hours ago

System Card: Claude Mythos Preview [pdf](www-cdn.anthropic.com)

222 points | 142 commentspage 2

waNpyt-menrew 2 hours ago|

Larger model, better benchmarks. Bigger bomb more yield.

Any benchmarks where we constraint something like thinking time or power use?

Even if this were released no way to know if it’s the same quant.

omcnoe 47 minutes ago|

Yes - eg. page 192 BrowseComp bunchmark.

Mythos preview has higher accuracy with fewer tokens used than any previous Claude model. Though, the fact that this incredibly strong result was only presented for BrowseComp (a kind of weird benchmark about searching for hard to find information on the internet) and not for the other benchmarks implies that this result is likely not the same for those other benchmarks.

gessha 2 hours ago||

It would be funny if Alibaba extend the free trial on openrouter/Qwen 3.6 until they collect enough data to beat Anthropic.

jdthedisciple 1 hour ago||

Opus 4.6 is already incredible so this leap is huge.

Although, amusingly, today Opus told me that the string 'emerge' is not going to match 'emergency' by using `LIKE '%emerge%'` in Sqlite

Moment of disappointment. Otherwise great.

bornfreddy 1 hour ago||

I only have 3 points against LLMs: they lack reason and they can't count.

FeepingCreature 1 hour ago||

'emer ge' is two tokens, 'emergency' is one. The models think in a logosyllabic language.

mpalmer 3 hours ago||

> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.

A month ago I might have believed this, now I assume that they know they can't handle the demand for the prices they're advertising.

IceWreck 2 hours ago||

Didn't OpenAI say something similar about GPT-3? Too dangerous to open source and then afew years later tehy were open sourcing gpt-oss because a bunch of oss labs were competing with their top models.

FeepingCreature 1 hour ago|||

OpenAI didn't release GPT-2 initially because they were worried it would make it too easy to generate spam. Which it kinda did.

abroszka33 1 hour ago|||

OpenAI said that GPT-5 was too dangerous to release... And look where we are now. It's mostly hype.

wg0 3 hours ago|||

That's for the investors basically. Scarcity and FOMO.

causal 1 hour ago||

*Until GPT-6 comes out, at which point Mythos will coincidentally be sufficiently safety-tested to release :)

skippyboxedhero 2 hours ago|||

GPT-2, o1, Opus...been here so many times. The reason they do this is because they know it works (and they seem to specifically employ credulous people who are prone to believe AGI is right around the corner). There haven't been significant innovations, the code generated is still not good but the hype cycle has to retrigger.

I remember when OpenAI created the first thinking model with o1 and there were all these breathless posts on here hyperventilating about how the model had to be kept secret, how dangerous it was, etc.

Fell for it again award. All thinking does is burn output tokens for accuracy, it is the AI getting high on its own supply, this isn't innovation but it was supposed to super AGI. Not serious.

chaos_emergent 1 hour ago|||

> All thinking does is burn output tokens for accuracy

“All that phenomenon X does is make a tradeoff of Y for Z”

It sounds like you’re indignant about it being called thinking, that’s fine, but surely you can realize that the mechanism you’re criticizing actually works really well?

b65e8bee43c2ed0 2 hours ago||||

>I remember when OpenAI created the first thinking model with o1 and there were all these breathless posts on here hyperventilating about how the model had to be kept secret, how dangerous it was, etc.

I've read that about Llama and Stable Diffusion. AI doomers are, and always have been, retarded.

simianwords 2 hours ago||||

Incredible that people still think like this.

skippyboxedhero 2 hours ago||

You're completely right.

simianwords 2 hours ago||

uhh the model found actual vulnerabilities in software that people use. either you believe that the vulnerabilities were not found or were not serious enough to warrant a more thoughtful release

mlsu 1 hour ago||

So did GPT-4.

https://arxiv.org/html/2402.06664v1

Like think carefully about this. Did they discover AGI? Or did a bunch of investors make a leveraged bet on them "discovering AGI" so they're doing absolutely anything they can to make it seem like this time it's brand new and different.

If we're to believe Anthropic on these claims, we also have to just take it on faith, with absolutely no evidence, that they've made something so incredibly capable and so incredibly powerful that it cannot possibly be given to mere mortals. Conveniently, that's exactly the story that they are selling to investors.

Like do you see the unreliable narrator dynamic here?

mgfist 1 hour ago|||

On the other hand I've gotten to use opus-4.6 and claude code and the quality is off the charts compared to 2023 when coding agents first hit the scene. And what you're saying is essentially "If they haven't created God, I'm not impressed". You don't think there's some middleground between those two?

Also they just hit a $30B run-rate, I don't think they're that needy for new hype cycles.

simianwords 1 hour ago|||

I don't see the problem here. How would you have handled it differently? If you released this model as such without any safety concern, the vulnerabilities might be found by bad actors and used for wrong things.

What do you find surprising here?

mlsu 55 minutes ago||

Vulnerabilities were found, probably a few by bad actors, when GPT4 was released. Every vulnerability found now is probably found with AI assistance at the very least. Should they have never released GPT4? Should we have believed claims that GPT4 was too dangerous for mere mortals to access? I believe openAI was making similar claims about how GPT4 was a step function and going to change white collar work forever when that model was released.

The point is that this whole "the model is too powerful" schtick is a bunch of smoke and mirrors. It serves the valuation.

simianwords 46 minutes ago||

Its far more simple to believe that they are releasing it step by step. Release to trusted third parties first, get the easy vulnerabilities fixed, work on the alignment and then release to public.

Do you don't believe that the vulnerabilities found by these agents are serious enough to warrant staggered release?

vonneumannstan 2 hours ago|||

Lol you haven't used a model since GPT2 is what it sounds like.

skippyboxedhero 2 hours ago||

Just checked my subscription start date for Anthropic. September 2023, I believe before they announced public launch.

Sorry kid.

SyneRyder 2 hours ago|||

Genuine question - if you don't think the models are improved or that the code is any good, why do you still have a subscription?

You must see some value, or are you in a situation where you're required to test / use it, eg to report on it or required by employer?

(I would disagree about the code, the benefits seem obvious to me. But I'm still curious why others would disagree, especially after actively using them for years.)

skippyboxedhero 1 hour ago||

The assumption that the other person made was that I would only use it for coding. If you look through my other comments today, I suggest that they are useful for performing repetitive tasks i.e. checking lint on PR, etc. Also, can be used for throwaway code, very useful.

I don't think the issue is with the model, it is with the implication that AGI is just around the corner and that is what is required for AI to be useful...which is not accurate. The more grey area is with agentic coding but my opinion (one that I didn't always hold) is that these workflows are a complete waste of time. The problem is: if all this is true then how does the CTO justify spending $1m/month on Anthropic (I work somewhere where this has happened, OpenAI got the earlier contract then Cursor Teams was added, now they are adding Anthropic...within 72 hours of the rollout, it was pulled back from non-engineering teams). I think companies will ask why they need to pay Anthropic to do a job they were doing without Anthropic six months ago.

Also, the code is bad. This is something that is non-obvious to 95% of people who talk about AI online because they don't work in a team environment or manage legacy applications. If I interview somewhere and they are using agentic workflow, the codebase will be shit and the company will be unable to deliver. At most companies, the average developer is an idiot, giving them AI is like giving a monkey an AK-47 (I also say this as someone of middling competence, I have been the monkey with AK many times). You increase the ability to produce output without improving the ability to produce good output. That is the reality of coding in most jobs.

AI isn't good enough to replace a competent human, it is fast enough to make an incompetent human dangerous.

vonneumannstan 2 hours ago|||

So you are doubly stupid, by not seeing any improvement in the models and also paying for models you believe are terrible? lol

skippyboxedhero 2 hours ago||

That doesn't follow logically from what I said. You should ask your AI for help with this. You are in need of some artificial intelligence.

b65e8bee43c2ed0 2 hours ago||

you would be a fool to believe it at any point in time. Amodei is anthropomorphic grease, even more so than Altman.

Anthropic is burning through billions of VC cash. if this model was commercially viable, it would've been released yesterday.

landtuna 2 hours ago||

If there's limited hardware but ample cash, it doesn't make sense to sell compute-intensive services to the public while you're still trying to push the frontier of capability.

b65e8bee43c2ed0 2 hours ago||

that's more or less what I'm saying. "Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available", translated from bullshit, means "It would've cost four digits per 1M tokens to run this model without severe quantization, and we think we'll make more money off our hardware with lighter models. Cool benchmarks though, right?"

vonneumannstan 2 hours ago||

Are you guys ready for the bifurcation when the top models are prohibitively expensive to normal users? If your AI budget $2000+ a month? Or are you going to be part of the permanent free tier underclass?

adi_kurian 2 hours ago||

If one is to believe the API prices are reasonable representation of non subsidized "real world pricing" (with model training being the big exception), then the models are getting cheaper over time. GPT 4.5 was $150.00 / 1M tokens IIRC. GPT o1-pro was $600 / 1M tokens.

vonneumannstan 2 hours ago||

You can check the hardware costs for self hosting a high end open source model and compare that to the tiers available from the big providers. Pretty hard to believe its not massively subsidized. 2 years of Claude Max costs you 2,400. There is no hardware/model combination that gets you close to that price for that level of performance.

adi_kurian 1 hour ago||

Yes that's why I said API price. I once used the API like I use my subscription and it was an eye watering bill. More than that 2 year price in... a very short amount of time. With no automations/openclaw.

OsrsNeedsf2P 2 hours ago|||

Inference for the same results has been dropping 10x year over year[0]

[0] https://ziva.sh/blogs/llm-pricing-decline-analysis

ceejayoz 2 hours ago||

Sure, but "the same results" will rapidly become unacceptable results if much better results are available.

hibikir 2 hours ago|||

When we go with any other good in the economy, price is always relevant: After all, the price is a key part of any offering. There are $80-100k workstations out there, but most of us don't buy them, because the extra capabilities just aren't worth it vs, say a $3000 computer, and or even a $500 one. Do I need a top specialist to consult for a stomachache, at $1000 a visit? Definitely not at first.

There's a practical difference to how much better certain kinds of results can be. We already see coding harnesses offloading simple things to simpler models because they are accurate enough. Other things dropped straight to normal programs, because they are that much more efficient than letting the LLM do all the things.

There will always be problems where money is basically irrelevant, and a model that costs tens of thousand dollars of compute per answer is seen as a great investment, but as long as there's a big price difference, in most questions, price and time to results are key features that cannot be ignored.

swader999 2 hours ago||||

Yes, it will always be an arms race game.

esafak 2 hours ago|||

Or will they rapidly become indistinguishable since they both get the job done?

asadm 1 hour ago||

if it can pay my rent, why not?

awestroke 3 hours ago||

I predict they will release it as soon as Opus 4.6 is no longer in the lead. They can't afford to fall behind. And they won't be able to make a model that is intelligent in every way except cybersecurity, because that would decrease general coding and SWE ability

chippiewill 2 hours ago|

Alternatively they'll just wreck it down a bit so it beats a competitor but isn't unsafe.

Stevvo 2 hours ago||

"Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available."

Disappointing that AGI will be for the powerful only. We are heading for an AI dystopia of Sci-Fi novels.

girvo 42 minutes ago||

Not surprising though, this was always going to be the end result within our current systems I think. When you add up: scaling power and required cost, then how talent concentrates in our economic systems, we were always going to end up with monopolies I think

Unless governments nationalise the companies involved, but then there’s no way our governments of today give this power out to the masses either.

gom_jabbar 25 minutes ago||

Expected outcome. Nick Land and the CCRU have explored how capitalism operationalizes science fiction (distilled in the concept of Hyperstition). Viewed through this lens, prices encode "distributed SF narratives." [0]

[0] Nick Land (1995). No Future in Fanged Noumena: Collected Writings 1987-2007, Urbanomic, p. 396.

LoganDark 3 hours ago||

> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.

Shame. Back to business as usual then.

Tepix 2 hours ago|

I for one applaud them for being cautious.

LoganDark 2 hours ago||

Being cautious is fine. Farming hype around something that may as well not exist for us should be discouraged. I do appreciate the research outputs.

juleiie 2 hours ago||

Honestly if that was some kind of research paper, it would be wholly insufficient to support any safety thesis.

They even admit:

"[...]our overall conclusion is that catastrophic risks remain low. This determination involves judgment calls. The model is demonstrating high levels of capability and saturates many of our most concrete, objectively-scored evaluations, leaving us with approaches that involve more fundamental uncertainty, such as examining trends in performance for acceleration (highly noisy and backward-looking) and collecting reports about model strengths and weaknesses from internal users (inherently subjective, and not necessarily reliable)."

Is this not just an admission of defeat?

After reading this paper I don't know if the model is safe or not, just some guesses, yet for some reason catastrophic risks remain low.

And this is for just an LLM after all, very big but no persistent memory or continuous learning. Imagine an actual AI that improves itself every day from experience. It would be impossible to have a slightest clue about its safety, not even this nebulous statement we have here.

Any sort of such future architecture model would be essentially Russian roulette with amount of bullets decided by initial alignment efforts.

ansc 3 hours ago|

Congratulations to the US military, I guess.

jjice 2 hours ago|

Doesn't Anthropic not have that contract anymore, after all that buzz a month or so ago?

laweijfmvo 1 hour ago|||

The US has invaded two sovereign countries this year to take their oil. I assume taking over a US company for their AI model would be trivial.

wmf 2 hours ago|||

The point of that buzz was to force Anthropic to provide Mythos to the military.

jjice 2 hours ago||

Yeah but I thought they lost the contract, so that's my confusion with the parent's comment, which seemed to me to see this as something that the US military would benefit from. Maybe I misinterpreted?

More comments...