LLMs could be, but shouldn't be compilers

Posted by alpaylan 16 hours ago

LLMs could be, but shouldn't be compilers(alperenkeles.com)

108 points | 119 commentspage 3

lfsss 14 hours ago|

You want to fly on an AI-developed airplane. I don't (just kidding haha).

ryanschneider 15 hours ago||

I’m kind of surprised no one has mentioned this one yet: https://www.lesswrong.com/posts/gQyphPbaLHBMJoghD/comp-sci-i...

Daviey 14 hours ago||

If you have decent unit and functional tests, why do you care how the code is written?

This feels like the same debate assembly programmers had about C in the 60s. "You don’t understand what the compiler is doing, therefore it’s dangerous". Eventually we realised the important thing isn’t how the code was authored but whether the behaviour is correct, testable, and maintainable.

If code generated by an LLM:

  - passes a real test suite (not toy tests),
  - meets performance/security constraints,
  - goes through review like any other change,

then the acceptance criteria haven’t changed. The test suite is part of the spec. If the spec is enforced in CI, the authoring tool is secondary.

The real risk isn’t "LLMs as compilers", it’s letting changes bypass verification and ownership. We solved that with C, with large dependency trees, with codegen tools. Same playbook applies here.

If you give expected input and get expected output, why does it matter how the code was written?

shauhss 13 hours ago|

Because testing at this level is a likely impossible across all domains of programming. You can narrow the set of inputs and get relatively far, but the more complex the systems the broader the space of problems becomes. And even a simple crud app on an EC2 has a lot more failure modes than people are able to test for with current tools.

> passes a real test suite (not toy tests)

“not toy tests” is doing a lot of heavy lifting here. Like an immeasurable amount of lifting.

lunarboy 15 hours ago||

Are LLMs not already compilers? They translate human natural language to code pretty well now. But yeah, they probably don't fit the bill of English based code to machine code

rvz 14 hours ago|

> Are LLMs not already compilers? They translate human natural language to code pretty well now.

Can you formally verify prose?

> But yeah, they probably don't fit the bill of English based code to machine code

Which is why LLMs cannot be compilers that transform code to machine code.

jerf 15 hours ago||

A lot of people are mentally modeling the idea that LLMs are either now or will eventually be infinitely capable. They are and will stubbornly persist in being finite, no matter how much capacity that "finite" entails. For the same reason that higher level languages allow humans to worry less about certain details and more about others, higher level languages will allow LLMs to use more of their finite resources on solving the hard problems as well.

Using LLMs to do something like what a compiler can already do is also modelling LLMs as infinite rather than finite. In fact in this particular situation not only are they finite, they're grotesquely finite, in particular, they are expensive. For example, there is no world where we just replace our entire infrastructure from top to bottom with LLMs. To see that, compare the computational effort of adding 10 8-digit numbers with an LLM versus a CPU. Or, if you prefer something a bit less slanted, the computational costs of serving a single simple HTTP request with modern systems versus an LLM. The numbers run something like LLMs being trillions of times more expensive, as an opening bid, and if the AIs continue to get more expensive it can get even worse than that.

For similar reasons, using LLMs as a compiler is very unlikely to ever produce anything even remotely resembling a payback versus the cost of doing so. Let the AI improve the compiler instead. (In another couple of years. I suspect today's AIs would find it virtually impossible to significatly improve an already-optimized compiler today.)

Moreover, remember, oh, maybe two years back when it was all the rage to have AIs be able to explain why they gave the answer they did? Yeah, I know, in the frenzied greed to be the one to grab the money on the table, this has sort of fallen by the wayside, but code is already the ultimate example of that. We ask the LLM to do things, it produces code we can examine, and the LLM session then dies away leaving only the code. This is a good thing. This means we can still examine what the resulting system is doing. In a lot of ways we hardly even care what the LLM was "thinking" or "intending", we end up with a fantastically auditable artifact. Even if you are not convinced of the utility of a human examining it, it is also an artifact that the next AI will spend less of its finite resources simply trying to understand and have more left over to actually do the work.

We may find that we want different programming languages for AIs. Personally I think we should always try to retain that ability for humans to follow it, even if we build something like that. We've already put the effort into building AIs that produce human-legible code and I think it's probably not that great a penalty in the long run to retain that. At the moment it is hard to even guess what such a thing would look like, though, as the AIs are advancing far faster than anyone (or any AI) could produce, test, prove out, and deploy such a language, against the advantage of other AIs simply getting better at working with the existing coding systems.

rvz 15 hours ago||

Anyone who knows 0.1% about LLMs should know that they are not deterministic systems and are totally unpredictable with their outputs meaning that they cannot become compilers at all.

The obvious has been stated.

pjmlp 12 hours ago||

Anyone that knows 0.1% about GC and JIT compilers also knows how hard is to have deterministic behaviours, and how much their behaviours are driven by heuristics.

WithinReason 15 hours ago||

Anyone who knows 0.2% about LLMs should know that they can be sampled deterministically, and yet that doesn't change the argument.

rvz 15 hours ago||

We do not trust them (LLMs) 100% to reliably emit correct assembled code (why would anyone) compared with a compiler which the latter is deterministic and the former is fundamentally stochastic, no matter how you sample them.

LLMs are not designed for that.

hackinthebochs 14 hours ago||

There's almost a good point here, but you're misusing concepts that obfuscate the point you're trying to make. Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic. Inference produces scores for every word in their vocabulary. This score map is then sampled from according to the temperature to produce the next token. But this non-determinism is artificially injected.

But the determinism/non-determinism axis isn't the core issue here. The issue is that they are trained by gradient descent which produces instability/unpredictability in its output. I can give it a set of rules and a broad collection of examples in its context window. How often it will correctly apply the supplied rules to the input stream is entirely unpredictable. LLMs are fundamentally unpredictable as a computing paradigm. LLMs training process is stochastic, though I hesitate to call them "fundamentally stochastic".

rvz 13 hours ago||

> Determinism is about producing the same output given the same input. In this sense, LLMs are fundamentally deterministic.

You cannot formally verifiy prose or the text that LLMs generates when attempting to compare what a compiler does. So even in this sense that is completely false.

No-one can guarrantee that the outputs will be 100% to what the instructions you are giving to the LLM, which is why you do not trust it. As long as it is made up of artificial neurons that predict the next token, it is fundamentally a stochastic model and unpredictable.

One can maliciously craft an input to mess up the network to get the LLM to produce a different output or outright garbage.

Compilers have reproducable builds and formal verification of their functionality. No such thing with LLMs exist. Thus, comparing LLMs to a compiler and suggesting that LLMs are 'fundamentally deterministic' or is even more than a compiler is completely absurd.

hackinthebochs 12 hours ago||

You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic. Your points are largely correct but you're not using the right words which just obfuscates your meaning.

rvz 3 hours ago||

Nope. You have not shown how a large scale collection of neural networks irrespective of their architecture is more deterministic when compared to a 'compiler' and only repeating a known misconception of tweaking the temperature to 0 which does not bring the determinism you claim it brings with LLMs [0] [1] [2], otherwise you would not have this problem in the first place.

By even doing that, the result of the outputs are useless anyway. So this really does not help your point at all. So therefore:

> You're just using words incorrectly. Deterministic means repeatable. That's it. Predictable, verifiable, etc are tangential to deterministic.

There is nothing deteministic or predictable about an LLM even when you compare it to a compiler, unless you can guarrantee that the individual neurons through inference give a predictable output which would be useful enough for being a drop-in compiler replacement.

[0] https://152334h.github.io/blog/non-determinism-in-gpt-4/

[1] https://arxiv.org/pdf/2506.09501

[2] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

hackinthebochs 1 hour ago||

Yes, there's some unknown sources of non-determinism when running production LLM architectures at full capacity. But that's completely irrelevant to the point. The core algorithm is deterministic. And you're still conflating deterministic and predictable. It's strange to have such disregard for the meaning of words and their correct usage.

fragmede 15 hours ago||

In the comparison to compilers, it relevant to point out that work began on them in the 1950's. That they're basically solid by the time most people here used them, should be looked at with that time frame in mind. ChatGPT came out in 2022, 3-4 years ago. Compilers have had around three quarters of a century years to get where they are today. I'll probably be dead in seventy years, nevermind have any idea what AI (or society) is going to look like then!

But for reference, we don't (usually) care which register three compiler uses for which variable, we just care that it works, with no bugs. If the non-dertetminism of LLMs mean the variable is called file, filename, or fileName, file_name, and breaking with convention, why do we care? At the level Claude let's us work with code now, it's immaterial.

Compilation isn't stable. If you clear caches and recompile, you don't get a bit-for-bit exact copy, especially on today's multi-core processors, without doing extra work to get there.

SpicyLemonZest 15 hours ago|

But the reason we don't care which register the compiler uses is that compilers, even without strict stability, reliably enforce abstractions that free us from having to care. If your compiler decided on 5% of inputs that it just doesn't feel like using more than two data registers, you'd have to think about it on 100% of inputs.

kittikitti 15 hours ago||

"LlMs HAlLuCinATE"

Stop this. This is such a stupid way way of describing mistakes from AI. Please try to use the confusion matrix or any other way. If you're going to try and make arguments, it's hard to take them seriously if you keep regurgitating that LLM's hallucinate. It's not a well defined definition so if you continually make this your core argument, it becomes disingenuous.

dgxyz 14 hours ago|

How about "expected poor ratio of corn to shit".?

jtrn 14 hours ago||

That was a painfull read for me. It reminds me of a specific annoyance I had at university with a professor who loved to make sweeping, abstract claims that sounded incredibly profound in the lecture hall but evaporated the moment you tried to apply them. It was always a hidden 'I-am-very-smart' attempt that fell apart if you actually deconstructed the meaning, the logic, or the claimed results. This article is the exact same breed of intellectualizing. It feels deep, but there is no actual logical hold if you break up the claims and deductive steps.

You can see it clearly if you just translate the article's expensive vocabulary into plain English. When the author writes, 'When you hand-build, the space of possibilities is explored through design decisions you’re forced to confront,' they are just saying, 'When you write code yourself, you have to choose how to write it.' When they claim, 'contextuality is dominated by functional correctness,' they just mean, 'Usually, we just care if the code works.' When they warn about 'inviting us to outsource functional precision itself,' they really mean, 'LLMs let you be lazy.' And finaly, 'strengthening the will to specify,' is just a dramatic way of saying, 'We need to write better requirements.' It is obscurantism plain and simple. using complexity to hide the fact that the insight is trivial.

But that is just an estethical problem to me. Worse. The argument collapses entirely when you look at the logical leap between the premises.

The author basically argues that because Natural Language is vague, engineers will inevitably stop caring about the details and just accept whatever reasonable output the AI gives. This is pure armchair psychology. It assumes that just because the tool allows for vagueness, professionals will suddenly abandon the concept of truth or functional requirements. That is a massive, unsubstantiated jump.

If we use fuzzy matching to find contacts on our phones all the time. Just because the search algorithm is imprecise doesn't mean we stop caring if we call the right person. We don't say, 'Well, the fuzzy match gave me Bob instead of Bill, I guess I'll just talk to Bob now.' The hard constraint, the functional requirement of talking to the specific person you need, remains absolute. Similarly, in software, the code either compiles and passes the tests, or it doesn't. The medium of creation might be fuzzy, but the execution environment is binary. We aren't going to drift into accepting broken banking software just because the prompt was in English.

This entire essay feels like those social psychology types that now have been thoroughly been discredited by the replication crisis in psychology. The ones who are where concerned with dazzling people with verbal skills than with being right. It is unnecessarily complex, relying on projection of dreamt up concepts and behavior, rather than observation. THIS tries to sound profound by turning a technical discussion into a philosophical crisis, but underneath the word salad, it is not just shallow, it is wrong.

genie3io 15 hours ago|

[dead]