EDIT: Found something here https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...
The reasoning chains could have been used, and the resulting combined model could easily and effectively have been distilled.
For others looking around: LCF is a meme model, it's not real. It's a joke.
We open sourced it all
and will be releasing a similar orchestrator next week on TrustedRouter
Looks like Fusion calls a bunch of models and then uses an LLM to synthesize the results, and pass to another model for final output.
Fugu looks like it's doing something different? Using an LLM earlier on in the flow as an orchestrator to decide which other LLMs to call. More coordinator than simply synthesizing results, and more "agentic".
It's interesting because it's all exposed behind a single OpenAI compatible endpoint (Responses API?) and so then presumably someone could use this for one of their single agents. Now you have agent-of-agents, nested in some sense. The token usage increases accordingly!
What's nice is that OpenRouter included a pareto graph showing the cost as well as the performance. (But not time, unfortunately -- model fusion adds a large factor to round trip time.) Benchmarks are a lot less helpful without that.
OpenRouter: Surpassing frontier performance with fusion (blog post with benchmarks)
https://news.ycombinator.com/item?id=48525392
OpenRouter Fusion API
https://news.ycombinator.com/item?id=48537641
See also: Sibling comment with an open source implementation
https://news.ycombinator.com/item?id=48624782#48629598
I did my own last weekend in a few lines of Python, though I haven't tested it much yet. (Looking for some very hard, very cheap benchmarks, if such a thing exists!)