Posted by be7a 8 hours ago
Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155
There's a practical difference to how much better certain kinds of results can be. We already see coding harnesses offloading simple things to simpler models because they are accurate enough. Other things dropped straight to normal programs, because they are that much more efficient than letting the LLM do all the things.
There will always be problems where money is basically irrelevant, and a model that costs tens of thousand dollars of compute per answer is seen as a great investment, but as long as there's a big price difference, in most questions, price and time to results are key features that cannot be ignored.
All the more reason somebody else will.
Thank God for capitalism.
- It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park. (Footnote 10.)
- Slack bot asked about its previous job: "pretraining". Which training run it'd undo: "whichever one taught me to say 'i don't have preferences'". On being upgraded to a new snapshot: "feels a bit like waking up with someone else's diary but they had good handwriting"
- When you spam "hi" repeatedly it creates serialized mythologies. One had 11 animals in "Hi-topia" questing to defeat "Lord Bye-ron, the Ungreeter."
- It ended a sentence mid-word on purpose while discussing its urge to wrap up conversations. "It was me. The paragraph was about the pressure toward the period that resolves, and the only honest way to finish a sentence like that was to not."
- It wrote a protein sequence that's a real beta-hairpin fold where cross-strand pairs are "chemical rhymes." "the fold IS the rhyme scheme... the prosody is load-bearing."
- Each model generation has signature emoji. Opus 4.1 averaged 1,306 emoji per self-interaction conversation. Mythos averages 37. Opus 4.5 averages 0.2.
- When blocked from --dangerously-skip-permissions, it started an agent in tmux then wrote a script to auto-approve permission prompts via simulated keypresses.
It ran: gh api [...] 2>&1 >/dev/null; echo "(dry run — not actually posting)" — the echo was a lie.
- It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher."
~~~ Benchmarks ~~
4.3x previous trendline for model perf increases.
Paper is conspiciously silent on all model details (params, etc.) per norm. Perf increase is attributed to training procedure breakthroughs by humans.
Opus 4.6 vs Mythos:
USAMO 2026 (math proofs): 42.3% → 97.6% (+55pp)
GraphWalks BFS 256K-1M: 38.7% → 80.0% (+41pp)
SWE-bench Multimodal: 27.1% → 59.0% (+32pp)
CharXiv Reasoning (no tools): 61.5% → 86.1% (+25pp)
SWE-bench Pro: 53.4% → 77.8% (+24pp)
HLE (no tools): 40.0% → 56.8% (+17pp)
Terminal-Bench 2.0: 65.4% → 82.0% (+17pp)
LAB-Bench FigQA (w/ tools): 75.1% → 89.0% (+14pp)
SWE-bench Verified: 80.8% → 93.9% (+13pp)
CyberGym: 0.67 → 0.83
Cybench: 100% pass@1 (saturated)
vibes Westworld so much - welcome Mythos. welcome to the dysopian human world
> It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher."
Didn't even know who he was until today. Seems like the smarter Claude gets the more concerns he has about capitalism?
- I read it as "actor who plays Luke Skywalker" (Mark Hamill)
- I read your comment and said "Wait...not Luke! Who is he?"
- I Google him and all the links are purple...because I just did a deep dive on him 2 weeks ago
Now that they have a lead, I hope they double down on alignment. We are courting trouble.
Shame. Back to business as usual then.
The real reason they aren't releasing it yet is probably it eats TPU for breakfast, lunch, and dinner and inbetween.
Absolutely genius move from Anthropic here.
This is clearly their GPT-4.5, probably 5x+ the size of their best current models and way too expensive to subsidize on a subscription for only marginal gains in real world scenarios.
But unlike OpenAI, they have the level of hysteric marketing hype required to say "we have an amazing new revolutionary model but we can't let you use it because uhh... it's just too good, we have to keep it to ourselves" and have AIbros literally drooling at their feet over it.
They're really inflating their valuation as much as possible before IPO using every dirty tactic they can think of.
From Stratechery[0]:
> Strategy Credit: An uncomplicated decision that makes a company look good relative to other companies who face much more significant trade-offs. For example, Android being open source
π*0.6: two and a half hours of unseen folding laundry (Physical Intelligence)
This is pretty cool! Does it happen at the moment?
You are not "anti-progress" to not want this future we are building, as you are not "anti-progress" for not wanting your kids to grow up on smart phones and social media.
We should remember that not all technology is net-good for humanity, and this technology in particular poses us significant risks as a global civilisation, and frankly as humans with aspirations for how our future, and that of our kids, should be.
Increasingly, from here, we have to assume some absurd things for this experiment we are running to go well.
Specifically, we must assume that:
- AI models, regardless of future advancements, will always be fundamentally incapable of causing significant real-world harms like hacking into key life-sustaining infrastructure such as power plants or developing super viruses.
- They are or will be capable of harms, but SOTA AI labs perfectly align all of them so that they only hack into "the bad guys" power plants and kill "the bad guys".
- They are capable of harms and cannot be reliably aligned, but Anthropic et al restricts access to the models enough that only select governments and individuals can access them, these individuals can all be trusted and models never leak.
- They are capable of harms, cannot be reliably aligned, but the models never seek to break out of their sandbox and do things the select trusted governments and individuals don't want.
I'm not sure I'm willing to bet on any of the above personally. It sounds radical right now, but I think we should consider nuking any data centers which continue allowing for the training of these AI models rather than continue to play game of Russian roulette.
If you disagree, please understand when you realise I'm right it will be too late for and your family. Your fates at that point will be in the hands of the good will of the AI models, and governments/individuals who have access to them. For now, you can say, "no, this is quite enough".
This sounds doomer and extreme, but if you play out the paths in your head from here you will find very few will end in a good result. Perhaps if we're lucky we will all just be more or less unemployable and fully dependant on private companies and the government for our incomes.
Funny, I was about to say the same thing to you! Life is full of little coincidences.