Defeating Nondeterminism in LLM Inference

Posted by jxmorris12 3 days ago

Defeating Nondeterminism in LLM Inference(thinkingmachines.ai)

328 points | 130 commentspage 4

reasonableklout 2 days ago|

Some great discussion on twitter: https://x.com/thinkymachines/status/1965826369721623001

Seems a buried lede is that on-policy RL is unlocked by bitwise identical results between training and sampling. I'm not an expert here but my understanding is that this would allow for stronger guarantees about deployment/training alignment for the RL training that the labs already do.

I don't fully understand the BigMath example though. They show that off-policy RLVR requires off-policy correction, which avoids divergence, but is suboptimal because it results in noisy rewards. Then they say "we fixed the sampler and trainer numerical mismatch, which allows for on-policy RL, look how much better it is." It's not clear to me whether this is an artificial example that deliberately uses different trainer/sampler setups, or if it's actually impossible to have the same numerics between trainer/sampler without their fixes (even if we use same batch size, no atomics, etc.).

htrp 3 days ago||

We know what thinking machines does yet?

emharsha1812 2 days ago||

I think this is an excellent article which addresses the issue that I personally have been thinking about a long time. And no its not just some slop they put but actual an engineering blog(with open source code and reproducible results!) I think the company is off to a good start

sudohalt 3 days ago||

cool project but if this is what you are producing with $2 billion funding, i doubt you will survive. This is the type of article a grad student would write over a weekend.

lairv 3 days ago|

on the contrary this makes me bullish about their team, it shows that people here care about the craft

sudohalt 2 days ago||

The team is good, and I enjoyed the read. But this is just an engineering blog post. They're promoting this like it's ground breaking research and it's on their front-page. Ultimately this paper is not very meaningful and just a fun debugging session.

I've seen this play out dozens of times. So many startups that have come and go in the bay area were composed of extremely talented individuals, but almost all of them failed.

unit149 2 days ago||

[dead]

TNDnow 3 days ago||

Who needs a working product when you can spend all day designing the most WEWORK looking website and slap some pseud slop on it. It's like crypto "startups" but it's not even fun.

nowittyusername 3 days ago|

I am baffled that I still run against these statement years after LLM's have been around. LLM's are deterministic and always have been. The reason people are having issues with them is because they are basing their assumptions on api based experiments. Like my man, how can you be making these statements when you haven't done the due diligence of running the LLM on your own hardware with all of the variables locked down and accounted for? If you do just that it would become obviously clear that they are deterministic and most of the time the reason you see the non deterministic behavior is because you have not controlled for a variable. Usually prompt caching, batch processing or some other obvious variable. Now this is related to within same system deterministic behavior. You might get different answers when running on a different gpu, but at least for same systems the behavior is 100% identical if you account for all server startup flags and properly account for things like prompt cashing, slot contamination etc...

Voloskaya 3 days ago||

I suggest you look up the name of the main author of TFA before assuming they don’t know what they are talking about.

This is literally one of the most knowledgeable person on the topic. I think you are the one that hasn’t peeled enough layers to connect with what they are saying.

sudohalt 2 days ago||

1. they aren't, they are just popular online. 2. the author has nothing to do with the original comment. Why do you think academic reviews are double blind?

Voloskaya 2 days ago||

One of the top 5 most active contributors to pytorch over the last few years, and specifically working on some of it's most hardcore components is "just popular online"?

If you say so.

> the author has nothing to do with the original comment

Except for the part of the comment that was assuming the author had no idea how this all works, has only used LLMs through API and has never run a local model, you mean?

golol 3 days ago|||

Hold on a second. A transformer produces deterministically a probability distribution over the token alphabet from the context. Then one samples from this distribution. This is random and meant to be random.

nowittyusername 3 days ago|||

The sampling process isn't random. If you sample with identical sampling parameters and identical values for said parameters, you will always get same results. You only start getting "non deterministic" behavior when you start using more complex systems outside the scope of your control like multi gpu systems and batch processing. One llm sampled with cash prompting off and and batch processing off will always generate same results if all values are same.

oasisaimlessly 3 days ago|||

It's possible to deterministically sample from a probability distribution. For example, just seed your RNG with a constant, or with the SHA256 hash of the context.

golol 3 days ago||

Well yes, you can "hack" the pseudorandom number generator, but... that's not really the point when talking about determinism in LLMs is it? I mean the mathematical idea of the standard LLM is certainly truly random.

Voloskaya 2 days ago||

> I mean the mathematical idea of the standard LLM is certainly truly random.

Not really, LLMs give you a distribution over possible next tokens. You are free to then sample from this distribution how you want. There is no need to hack RNG or whatever, for example you can simply just take a greedy approach and always output the most likely token, in which case the LLM becomes deterministic (mathematically). This is equivalent to setting the temperature to 0.

tossandthrow 3 days ago||

The article literally justifies This in the second paragraph.

nowittyusername 3 days ago||

I suppose I have issues with the way "determinism" is used in the title of this article. It can mean different things to different people and in my mind stating that "Defeating Nondeterminism in LLM Inference" frames it as an actual issue with LLM inference. But its not, its an issue with LLM inference when you start using large scale inference with more complex parts such as systems which use multi gpu inference systems or batching processes and other mechanisms. It is not an issue when using an LLM without those more complex parts. Stating it this way muddies the signal and gives a false sense that this is a fundamental issue with architecture, where its an issue of the systems at scale...