Mercury 2: The fastest reasoning LLM, powered by diffusion

Posted by fittingopposite 7 hours ago

Mercury 2: The fastest reasoning LLM, powered by diffusion(www.inceptionlabs.ai)

137 points | 74 commentspage 2

tl2do 6 hours ago|

Genuine question: what kinds of workloads benefit most from this speed? In my coding use, I still hit limitations even with stronger models, so I'm interested in where a much faster model changes the outcome rather than just reducing latency.

layoric 6 hours ago||

I think it would assist in exploiting exploring multiple solution spaces in parallel, and can see with the right user in the loop + tools like compilers, static analysis, tests, etc wrapped harness, be able to iterate very quickly on multiple solutions. An example might be, "I need to optimize this SQL query" pointed to a locally running postgres. Multiple changes could be tested, combined, and explain plan to validate performance vs a test for correct results. Then only valid solutions could be presented to developer for review. I don't personally care about the models 'opinion' or recommendations, using them for architectural choices IMO is a flawed use as a coding tool.

It doesn't change the fact that the most important thing is verification/validation of their output either from tools, developer reviewing/making decisions. But even if don't want that approach, diffusion models are just a lot more efficient it seems. I'm interested to see if they are just a better match common developer tasks to assist with validation/verification systems, not just writing (likely wrong) code faster.

cjbarber 6 hours ago|||

I've tried a few computer use and browser use tools and they feel relatively tok/s bottlenecked.

And in some sense, all of my claude code usage feels tok/s bottlenecked. There's never really a time where I'm glad to wait for the tokens, I'd always prefer faster.

volodia 4 hours ago|||

There are few: fast agents, deep research, real-time voice, coding. The other thing is that when you have a fast reasoning model, you spend more effort on thinking in the same latency budget, which pushed up quality.

corysama 3 hours ago|||

Coding auto-complete?

irthomasthomas 6 hours ago|||

multi-model arbitration, synthesis, parallel reasoning etc. Judging large models with small models is quite effective.

quotemstr 4 hours ago||

Once you make a model fast and small enough, it starts to become practical to use LLMs for things as mundane as spell checking, touchscreen-keyboard tap disambiguation, and database query planning. If the fast, small model is multimodal, use it in a microwave to make a better DWIM auto-cook.

Hell, want to do syntax highlighting? Just throw buffer text into an ultra-fast LLM.

It's easy to overlook how many small day-to-day heuristic schemes can be replaced with AI. It's almost embarrassing to think about all the totally mundane uses to which we can put fast, modest intelligence.

chriskanan 4 hours ago||

I can see some promise with diffusion LLMs, but getting them comparable to the frontier is going to require a ton of work and these closed source solutions probably won't really invigorate the field to find breakthroughs. It is too bad that they are following the path of OpenAI with closed models without details as far as I can tell.

mhitza 5 hours ago||

Comment retracted. My bad, missed some details.

selcuka 5 hours ago||

I think your comment is a bit unfair.

> no reasoning comparison

Benchmarks against reasoning models:

https://www.inceptionlabs.ai/blog/introducing-mercury-2

> no demo

https://chat.inceptionlabs.ai/

> no info on numbers of parameters for the model

This is a closed model. Do other providers publish the number of parameters for their models?

> testimonials that don't actually read like something used in production

Fair point.

volodia 4 hours ago|||

Just to clarify one point: Mercury (the original v1, non-reasoning model) is already used in production in mainstream IDEs like Zed: https://zed.dev/blog/edit-prediction-providers

Mercury v1 focused on autocomplete and next-edit prediction. Mercury 2 extends that into reasoning and agent-style workflows, and we have editor integrations available (docs linked from the blog). I’d encourage folks to try the models!

mhitza 4 hours ago|||

You are right edited my post (twice actually). Missed the chat first time around (though its hard to see it as a reasoning model when chain of thought is hidden, or not obvious. I guess this is the new normal), and also missed the reasoning table because text is pretty small on mobile and I thought its another speed benchmark.

selcuka 1 hour ago||

I tried their chat demo again, and if you set reasoning effort to "High", you sometimes see the chain of thought before the answer (click the "Thought for n seconds" text to expand it).

That being said, the chain is pretty basic. It's possible that they don't disclose the full follow-up prompt list.

pants2 5 hours ago||

Reading such obvious LLM-isms in the announcement just makes me cringe a bit too, ex.

> We optimize for speed users actually feel: responsiveness in the moments users experience — p95 latency under high concurrency, consistent turn-to-turn behavior, and stable throughput when systems get busy.

exabrial 4 hours ago||

I believe Jimmy Chat is still faster by an order of magnitude…

poly2it 3 hours ago|

What does Jimmy Chat have to do with diffusion models?

lprimeisafk 4 hours ago||

It fails the car wash test

emi2k01 4 hours ago|

The test:

> The car wash is 100m away from my house. Should I walk or drive?

I didn't find a way to share the conversation, so here it is the full response in `medium` settings:

---

Walking 100 m is a very short distance—roughly a minute or two at a normal pace. In most situations, walking will be quicker overall because you avoid the time needed to get into the car, start it, and find parking. It also saves fuel and is better for the environment.

*Consider these factors before deciding:*

| Factor | Walking | Driving | |--------|---------|---------| | *Time* | ~1–2 min (plus a few seconds to step out) | ~1 min driving, plus ~1–2 min to get in/out and park | | *Energy/Fuel* | None | Uses a small amount of fuel | | *Weather* | Comfortable if dry & mild | Better if it’s raining, very hot, or icy | | *Physical ability* | Easy for most people | Not needed for such a short trip | | *Car availability* | Not required | Must have a car nearby and be willing to use it | | *Environmental impact* | Zero emissions | Small emissions |

*Recommendation:* If the weather is pleasant and you don’t need the car for anything else right away, walking is the simplest, fastest, and most eco‑friendly choice. Drive only if you’re dealing with inclement weather, have heavy items to carry, or need the car immediately after the wash.

Do you have any specific constraints (e.g., rain, heavy bags, time pressure) that might affect the decision?

rtfeldman 4 hours ago||

If a stranger asks me, "Should I walk or drive to this car wash?" then I assume they're asking in good faith and both options are reasonable for their situation. So it's a safe assumption that they're not going there to get their car washed. Maybe they're starting work there tomorrow, for example, and don't know how pedestrian-friendly the route is.

Is the goal behind evaluating models this way to incentivize training them to assume we're bad-faith tricksters even when asking benign questions like how best to traverse a particular 100m? I can't imagine why it would be desirable to optimize for that outcome.

(I'm not saying that's your goal personally - I mean the goal behind the test itself, which I'd heard of before this thread. Seems like a bad test.)

zamalek 3 hours ago||

> I need to get my car washed; should I drive or walk to the car wash that is 100m away?

> Walking 100 m is generally faster, cheaper, and better for the environment than driving such a short distance. If you have a car that’s already running and you don’t mind a few extra seconds, walking also avoids the hassle of finding parking or worrying about traffic.

rtfeldman 56 minutes ago||

That's a much better test!

davistreybig 3 hours ago||

This is unbelievably fast

dw5ight 4 hours ago||

this looks awesome!!

MarcLore 4 hours ago||

[dead]

dhruv3006 4 hours ago||

I am little underwhelmed by anything diffusion at the moment - they didn't really deliver.

quotemstr 4 hours ago|

What isn't these days? I've found it pointless to get upset about it.

dhruv3006 4 hours ago||

We need a new architecture - i wonder what ilya is cooking.

arjie 5 hours ago|

Please pre-render your website on the server. Client-side JS means that my agent cannot read the press-release and that reduces the chance I am going to read it myself. Also, day one OpenRouter increases the chance that someone will try it.