Sakana Fugu - Hacker News

Posted by Finbarr 1 day ago

232 points | 121 commentspage 2

eevmanu 1 day ago|

Reminds me of <https://github.com/irthomasthomas/llm-consortium>

eevmanu 1 day ago|

Fugu Ultra <https://console.sakana.ai/models#fugu-ultra> sounds similar to GPT-5.5 Pro or Gemini 3.1 Deep Think .

Is there any official source that could confirms if Fable (or Mythos) is parallelized test-time compute (like GPT 5.5 Pro) or sparse Mixture-of-Experts (MoE) transformer combined with a multi-agent, inference-time compute scaling architecture (Gemini 3.1 Deep Think)?

personjerry 1 day ago||

I love when they put a black box in front of the other black boxes so I can get a questionably better black box for slower service and more money!

schmuhblaster 21 hours ago||

I’ve been working on my own harness / “orchestration layer”, not with the goal of reaching frontier level performance, but rather boosting performance of smaller (locally hostable) models. Unfortunately, I don’t have VC money to burn on running hundreds of evals, but some preliminary results do indicate that it could work[0].

https://deepclause.substack.com/p/how-to-make-small-models-p...

GolfPopper 1 day ago||

This is a joke, right?

NitpickLawyer 1 day ago|

Not necessarily. There were some tests last year-ish from hf that showed that simply alternating (randomly) between claude and gpt (whatever their versions were at the time) on a task produced better results than either of them individually. So during a task, the first call was sent to one, then the other and so on.

There's also the concept of "smart routing" requests based on some heuristics / embeddings. You'd get "simple" tasks handled by smaller (cheaper) models and use a bigger model to curate / sort / merge the results.

There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...

andai 1 day ago||

See also: Agents built from alloys (July 2025)

https://news.ycombinator.com/item?id=44630724

They randomly alternated between frontier LLMs and got a massive boost to performance on cybersecurity tasks.

david_shi 1 day ago||

Their research around building a domain specific model is pretty cool, it's kind of like Karpathy's autoresearch but pointed at deciding the optimal model to use at each step of the inference.

If cost becomes an even bigger problem being able to choose "best performance possible" or "strong but cost effective" will be useful.

https://arxiv.org/pdf/2512.04695

jordemort 1 day ago||

Fugu, eh? So there’s a nonzero chance this thing might kill me?

slopdetector 17 hours ago||

For anyone finding this, I used this during the beta. Beats GPT-5.5 xhigh on complex tasks. Since it’s expensive and difficult to subsidize, use it for the most challenging problems.

OAI/ANT can subsidize their own subscriptions, so it’s hard to compete there. But the results I got from fugu-ultra were impressive.

monkeydust 1 day ago||

Imho there are two dimensions here: Firstly different LLMs and secondly the strategy in which you break down the problem in an agentic fashion (e.g. break up to separate agents with own persona and then judge evaluates across all agents). You can of course mix-up the dimensions as well and that's what I have been tinkering* with for a good few months with some success. This was all done using home-brew setup running on openrouter.

Personally I prefer understanding the dimensions and the interplay and controlling it though can see why openrouter and others are now offering this a solved solution.

Just be careful when you start outsourcing too much of your intelligence needs to a blackbox.

* https://github.com/monkeydust/rightmind

mannanj 1 day ago|

This is interesting. Would you share a few ways in which you're using this in your workflow? What about if you were to start a new project and test and built it out from scratch - how do you work this approach in without bogging everything down(including the simple things) down with overanalysis?

monkeydust 15 hours ago||

I only use this for high value problems/challenges. A lot related to life decisions including work, where to live, finances etc. It gives me a small army that can break down and slice, dice the problem in different ways then someone to reconcile it all and present it back to me. The variance in their opinions is the most interesting part of this project so f ar.

mannanj 5 hours ago||

Thanks. I like the idea of doing this, myself, and glad you see a value from this.

Do you not worry about giving away your most intimate data to for-profit companies who have not signed to protect your data in a dignified fashion?

ed_mercer 1 day ago||

So basically... openrouter?

alasano 1 day ago||

OpenRouter Fusion is basically ask N models + synthesizer step.

This is ask a special orchestrator they built, which is in front of a bunch of models, which model would suit the request best.

Regular Fugu seems to be just "pick the best model and route the request there"

Fugu Ultra can generate like a little mini workflow/plan instead to achieve a result

1. Ask GPT to derive the math. 2. Ask Opus to check for implementation/security issues. 3. Ask Gemini to synthesize or resolve disagreement. 4. Return final answer.

I could be wrong but seems to be that at a glance, so I think it's more dynamic than OpenRouter Fusion.

Npovview 1 day ago|||

Then there is this as well.

https://www.databricks.com/blog/introducing-omnigent-meta-ha...

runeblaze 1 day ago|||

links to two papers with at least enough apparent quality and novelty to get into ICLR 2026

> So basically... openrouter

:skull:

i now really wonder how many people of the public understood my thesis defense lol

mark_l_watson 1 day ago|

Nice idea but expensive. It looks like they don’t add very low cost models like DeepSeek v4 flash into their mix.

After a few months of spending money on the best frontier models, now I am spending time using DeepSeek v4 flash as my workhorse, and flipping to more capable (but still very inexpensive) open models on an as-needed basis. We all make our own tool selection decisions, but for me, I feel happier and enjoy working more following the very fast response and ultra low cost path.

ljlolel 1 day ago|

We found that an all open source fusion was 1/3 the price and better than Fable

https://trustedrouter.com/blog/open-fusion-beats-fable-5

andai 1 day ago||

Brilliant. What this actually is, is a swarm, albeit a very small one. I'm wondering if for research specifically, swarm size (on higher temp?) would outweigh model size.

At least, for the initial data gathering phase. You'd probably want a sequence of progressively larger models to filter it.

Have you guys tested it on anything other than research?

ljlolel 1 day ago||

Working on bio coding and cybersecurity benchmarks now

More comments...