Top
Best
New

Posted by mindingnever 20 hours ago

Will It Mythos?(swelljoe.com)
305 points | 216 commentspage 2
snthpy 11 hours ago|
I miss Fable. Will it ever be back? As a non-US citizen living in Africa i fear that i will have to wait for an equivalent non-US model.
vessenes 11 hours ago||
I think you'll find other labs are racing to get you something while Anthropic works through their issues. So, yes, give it a few months, you'll have something equivalent from someone somewhere in the world.
lukaslalinsky 9 hours ago||
My hope is that Opus 5 will be released soon, basically a rebranded Fable.
draginol 10 hours ago||
I was pretty impressed with Fable when I used it. Fable on Low was better than Opus 4.8 on High (and cheaper).

Now, for me, it was really about how well it worked on big existing human made code bases. I was working on some new screens in GalCiv IV and if you've ever had to make screens for games, it is incredibly tedious, low brain work. But GPT 5.5 and Opus 4.8 would just struggle with these over and over again and this is C++ work with limited hotloading so it's a slow process. Fable nailed these screens fast.

stared 16 hours ago||
For malware detection, many models are biased for or against detecting a threat (likely a thing that can be adjusted with a prompt).

I suggest tasks cannot be guessed (find, not tell). And 2d charts, both for ROC and pricing, vide https://quesma.com/benchmarks/binaryaudit/

GeorgeWoff25 18 hours ago||
Spatial reasoning is where fable really separates itself imo
dlenski 5 hours ago||
Quite honestly, this is the most interesting and useful thing that I have ever read, directly responsive to the question of "how good are LLMS at doing difficult tasks, in terms of both bang-for-the-buck and in terms of raw performance?"

My hat's off to swelljoe.

This part was especially interesting:

> The cheap Chinese models kick ass. MiMo and DeepSeek are directly competitive with Opus 4.8 and GPT 5.5 at roughly an order of magnitude lower price. There have been accusations of “benchmaxxing” with the Chinese models, but I don’t think there’s any reasonable way for the models to already be tuned for these very recently disclosed bugs. I think they’re genuinely becoming competitive with the frontier from Anthropic and OpenAI. If you’re in a hurry, DeepSeek was the fastest, on average, while finding 4/9 bugs. And, if you’re cheap, MiMo found bugs as well as any model for the lowest price.

wrs 8 hours ago||
IIRC from the Anthropic report, the alleged danger of Mythos isn’t that it finds more vulnerabilities than previous models, but that it’s significantly more successful at exploiting them. Which this doesn’t seem to test.
seizethecheese 6 hours ago|
I would naively expect finding and exploiting to be related. Leaving this comment so someone can correct it, which would be interesting.
himata4113 17 hours ago||
What makes mythos special is the fact that someone with zero expertise in the field could find and weaponize a zero-day. Real threat actors already use llms em masse and the recent advancements with glm-5.2 will probably enable way more cyber attacks than fable ever could.
matheusmoreira 14 hours ago|
We can also use LLMs en masse to find and fix the zero days. I've definitely been using LLMs to audit my own computers.
sfjailbird 11 hours ago||
[flagged]
dang 6 hours ago||
Can you please not break the site guidelines like this? They include:

"Don't be snarky."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

"Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."

https://news.ycombinator.com/newsguidelines.html

hosel 11 hours ago|||
Is there any evidence that they nerf models? Anthropic is set to mark a profit Q2 2026 (which is actually not ideal), but there is profit.
kingofthehill98 10 hours ago|||
Only if you buy their math, which basically is "hey, if we don't do any training we can actually make a profit".

The problem with that math is that if they don't do any training they would be out of the market in 12 months, they're only relevant ("profitable") precisely because they trained the current reference SOTA model.

They can't just release Mythos and sit on top of it forever, competition is catching up fast and people expect a new more powerful model every 6 months.

imxyy_soope_ 10 hours ago||||
There are LLM performance trackers in the wild, for instance https://marginlab.ai

You may notice that the performance of the old model tends to decline before each new model release.

slipnslider 7 hours ago|||
Wasn't that non-GAAP profit?
dd8601fn 10 hours ago|||
How do they “nerf the models”?

Are they quietly compacting context to reduce kv cache usage, before the actual compaction? Like there’s a slider for how much to compress it, and that’s never revealed to us?

airstrike 9 hours ago||
I suspect they quantize them, reduce thinking budgets, batch more requests, or all of the above.
lwarfield 7 hours ago||
There's also lowering the number of experts you run in MoE models.
ern_ave 9 hours ago||
> there is no 'profit' step.

You have to learn to think like a drug dealer. The first hit is always free.

Companies and developers are growing more and more dependent on coding agents. Eventually, the owners of the AI will be able to charge whatever they want. What are you going to do? Go back to coding by hand? Do you even remember how?

rirze 6 hours ago||
I'm convinced if Mythos/Fable comes back at this point, it will be guardrailed into lobotomy.

It won't be as good.

tomcam 4 hours ago|
I feel like this could achieve techempower-level legend status
More comments...