Top
Best
New

Posted by signa11 6 hours ago

A sufficiently detailed spec is code(haskellforall.com)
334 points | 179 comments
bad_username 1 hour ago|
> There is no world where you input a document lacking clarity and detail and get a coding agent to reliably fill in that missing clarity and detail

That is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in. Furthermore, LLMs are the ultimate detail fillers, because they are language interpolation/extrapolation machines. And their popularity is precisely because they are usually very good at filling in details: LLMs use their vast knowledge to guess what detail to generate, so the result usually makes sense.

This doesn't detract much from the main point of the article though. Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved, important details have to be constrained, and for that they have to be specified. And whereas we have decades of tools and culture for coding, we largely don't have that for extremely detailed specs (except maybe at NASA or similar places). We could figure it out in the future, but we haven't yet.

Someone 1 hour ago||
> That is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions.

LLMs can generate (relatively small amounts of) working code from relatively terse descriptions, but I don’t think they can do so _reliably_.

They’re more reliable the shorter the code fragment and the more common the code, but they do break down for complex descriptions. For example, try tweaking the description of a widely-known algorithm just a little bit and see how good the generated code follows the spec.

> Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved

Seems you agree they _cannot_ reliably generate (relatively small amounts of) working code from relatively terse descriptions

mike_hearn 11 minutes ago||
Neither can humans, but the industry has decades of experience with how to instruct and guide human developer teams using specs.
lmm 1 hour ago|||
> LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in.

They can generate boilerplate, sure. Or they can expand out a known/named algorithm implementation, like pulling in a library. But neither of those is generating detail that wasn't there in the original (at most it pulls in the detail from somewhere in the training set).

tibbe 29 minutes ago||
They do more than that. If you ask for ui with a button that button won't be upside down even if you didn't specify its orientation. Lots of the detail can be inferred from general human preferences, which are present in the LLMs' training data. This extends way beyond CS stuff like details of algorithm implementations.
skywhopper 8 minutes ago||
That’s exactly what they said. Details “elsewhere in its training set”.
skywhopper 9 minutes ago||
“LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions”

Only with well-known patterns that represent shared knowledge specified elsewhere. If the details they “fill in” each time differ in ways that change behavior, then the spec is deficient.

If we “figure out” how to write such detailed specs in the future, as you suggest, then that becomes the “code”.

hintymad 4 hours ago||
> A sufficiently detailed spec is code

This is exactly the argument in Brooks' No Silver Bullet. I still believe that it holds. However, my observation is that many people don't really need that level of details. When one prompts an AI to "write me a to-do list app", what they really mean is that "write me a to-do list app that is better that I have imagined so far", which does not really require detailed spec.

mfabbri77 3 hours ago||
Yes. This happens because the training data contains countless SotA "to-do" apps. This argument does not scale well to other types of software.
baxtr 2 hours ago|||
Isn’t most standard software these days a permutation of things already done before?
Gabriel439 2 hours ago|||
Author here: it's not even clear that agents can reliably permute their training data (I'm not saying that it's impossible or never happens but that it's not something we can take for granted as a reliable feature of agentic coding).

As I mentioned in one of the footnotes in the post:

> People often tell me "you would get better results if you generated code in a more mainstream language rather than Haskell" to which I reply: if the agent has difficulty generating Haskell code then that suggests agents aren't capable of reliably generalizing beyond their training data.

If an agent can't consistently apply concepts learned in one language to generate code in another language, then that calls into question how good they are at reliably permuting the training dataset in the way you just suggested.

mike_hearn 9 minutes ago|||
Your argument is far too dependent on observations made about the model's ability with Haskell, which is irrelevant. The concepts in Haskell are totally different to almost any other language - you can't easily "generalize" from an imperative strict language like basically everything people really use to a lazy pure FP language that uses monads for IO like Haskell. The underlying concepts themselves are different and Haskell has never been mainstream enough for models to get good at it.

Pick a good model, let it choose its own tools and then re-evaluate.

rytis 1 hour ago||||
> if the agent has difficulty generating Haskell code then that suggests agents aren't capable of reliably generalizing beyond their training data.

doesn't that apply to flesh-and-bone developers? ask someone who's only working in python to implement their current project in haskell and I'm not so sure you'll get very satisfying results.

Frieren 1 hour ago|||
> doesn't that apply to flesh-and-bone developers?

No, it does not. If you have a developer that knows C++, Java, Haskell, etc. and you ask that developer to re-implement something from one language to another the result will be good. That is because a developer knows how to generalize from one language (e.g. C++) and then write something concrete in the other (e.g. Haskell).

cassianoleal 22 minutes ago||||
Your argument fails where it equates someone who only codes in one language to an LLM who is usually trained in many languages.

In my experience, a software engineer knows how to program and has experience in multiple languages. Someone with that level of experience tends to pick up new languages very quickly because they can apply the same abstract concepts and algorithms.

If an LLM that has a similar (or broader) data set of languages cannot generalise to an unknown language, then it stands to reason that it is indeed only capable of reproducing what’s already in its training data.

ozlikethewizard 1 hour ago||||
The hard bit of programming has never been knowing the symbols to tell the computer what to do. It is more difficult to use a completely unknown language, sure, but the paradigms and problem solving approaches are identical and thats the actual work, not writing the correct words.
lukevp 24 minutes ago||
Saying that the paradigms of Python and Haskell are the same makes it sound like you don’t know either or both of those languages. They are not just syntactically different. The paradigms literally are different. Python is a high level duck typed oo scripting language and Haskell is a non-oo strongly typed functional programming language. They’re extremely far apart.
debugnik 1 hour ago|||
But the model has seen pretty much all the public Haskell code around, and possibly been trained to write it in different settings.
graemep 1 hour ago||||
I am very sceptical mainstream languages will be better. I have seen plenty of bad Python from LLMs. Even with simple CRUD apps and when provided with detailed instructions.
lukan 1 hour ago||||
"that suggests agents aren't capable of reliably generalizing beyond their training data."

Yes? If they could, we would have a strong general intelligence by now and only few people are claiming this.

ChrisGreenHeur 1 hour ago|||
It can also mean that the other programming language is above the cognitive abilities of the LLM
loveparade 2 hours ago||||
But what's the point of re-building "standard software" if it is so standard that it already exists 100 times in the training data with slight variations?
ChrisGreenHeur 1 hour ago|||
The point is the small variations
lynx97 1 hour ago|||
I read this attitude very often on HN. "If someone else has already built it before, your effort is a waste of time." To me, it has this "Someone else already makes money from it, go somewhere else where you dont have competition." Well, I get the drift... But... Not everyone is into getting rich. You know, some of us just have fun building things and learning while doing so. It really doesn't matter if the path has been walked before. Not everything has to be plain novelty to count.
loveparade 1 hour ago||
If you do it for fun then why do you care whether an LLM can do it well or not, which was the original argument? Shouldn't matter to you in that case.
roarcher 2 hours ago||||
I'd say that's pretty much the definition of standard, yeah. And it's why you can't make a profit selling a simple ToDo app. If you expect people to pay for what you build, you have to build something that doesn't have a thousand free clones on the app store.
baxtr 2 hours ago||
I politely disagree.

I think you’re conflating software and product.

A product can be a recombination of standard software components and yet be something completely new.

layer8 2 hours ago||||
That isn’t saying much. Every software is a permutation of zeros and ones. The novelty or ingenuity, or just quality and fitness for purpose, can lie in the permutation you come up with. And an LLM is limited by its training in the permutations it is likely to come up with, unless you give it heaps of specific guidance on what to do.
mfabbri77 2 hours ago||||
In my experience, the further you move away from the user and toward the hardware and fundamental theoretical algorithms, the less true this becomes.

This is very true for an email client, but very untrue for an innovative 3D rendering engine technology (just an example).

layer8 2 hours ago|||
An email client is highly nontrivial, due to the complexities of the underlying standards, and how the real implementations you have to be compatible with don’t strictly follow them. Making an email client that doesn’t suck and is fully interoperable is quite an ambitious endeavor.
mfabbri77 2 hours ago||
The point was to answer the question: "Can every piece of software be viewed as a permutation of software that has already been developed?" In my opinion, an email client is a more favorable example than a 3D engine. In fields where it is necessary to differentiate, improve, or innovate at the algorithmic level, where research and development play a fundamental role, it is not simply a matter of permuting software or leveraging existing software components by simply assembling them more effectively.
Archer6621 1 hour ago||
Actually, in the specific case of a 3D program it's the current generation of LLM's complete lack of ability in spatial reasoning that prevents them from "understanding" what you want when you ask it to e.g. "make a camera that flies in the direction you are looking at".

It necessarily has to derive it from examples of cameras that fly forward that it knows about, without understanding the exact mathematical underpinnings that allow you to rotate a 3D perspective camera and move along its local coordinate system, let alone knowing how to verify whether its implementation functions as desired, often resulting in dysfunctional garbage. Even with a human in the loop that provides it with feedback and grounds it (I tried), it can't figure this out, and that's just a tiny example.

Math is precise, and an LLM's fuzzy approach is therefore a bad fit for it. It will need an obscene amount of examples to reliably "parrot" mathematical constructs.

debugnik 1 hour ago||
> "make a camera that flies in the direction you are looking at"

That's not the task of a renderer though, but its client, so you're talking past your parent comment. And given that I've seen peers one-shot tiny Unity prototypes with agents, I don't really believe they're that bad at taking an educated guess at such a simple prompt, as much as I wish it were true.

Archer6621 1 hour ago||
You're right. My point was more that LLMs are bad at (3D) math and spatial reasoning, which applies to renderers. Since Unity neatly abstracts the complexity away of this through an API that corresponds well to spoken language, and is quite popular, that same example and similar prototypes should have a higher success rate.

I guess the less detailed a spec has to be thanks to the tooling, the more likely it is that the LLM will come up with something usable. But it's unclear to me whether that is because of more examples existing due to higher user adoption, or because of fewer decisions/predictions having to be made by the LLM. Maybe it is a bit of both.

fmbb 2 hours ago|||
I would be surprised if there are more working email clients out there than working 3D engines. The gaming market is huge, most people do not pay to use email, hobbyists love creating game engines.
umanwizard 2 hours ago||
Idk, a working basic email client is just not that hard to write though. SMTP and IMAP are simple protocols and the required graphical interface is a very straightforward combination of standard widgets.
wongarsu 2 hours ago|||
Most software written today (or 10 years ago, or 50 years ago) is not particularly unique. And even in that software that is unusual you usually find a lot of run-of-the-mill code for the more mundane aspects
smackeyacky 2 hours ago|||
I don’t think this is true. I’ve been doing this since the 1980s and while you might think code is fairly generic, most people aren’t shipping apps they’re working on quiet little departmental systems, or trying to patch ancient banking systems and getting a greenfield gig is pretty rare in my experience.

So for me the code is mundane but it’s always unique and rarely do you come across the same problems at different organisations.

If you ever got a spec good enough to be the code, I’m sure Claude or whatever could absolutely ace it, but the spec is never good enough. You never get the context of where your code will run, who will deploy it or what the rollback plan is if it fails.

The code isn’t the problem and never was. The problem is the environment where your code is going.

The proof is bit rot. Your code might have been right 5 years ago but isn’t any more because the world shifted around it.

I am using Claude pretty heavily but there are some problems it is awful at, e.g I had a crusty old classic ASP website to resuscitate this week and it would not start. Claude suggested all the things I half remembered from back in the day but the real reason was Microsoft disabled vbscript in windows 11 24H2 but that wasn’t even on its radar.

I have to remind myself that it’s a fancy xerox machine because it does a damn good job of pretending otherwise.

nostrademons 2 hours ago|||
Most of the economically valuable software written is pretty unique, or at least is one of few competitors in a new and growing niche. This is because software that is not particularly unique is by definition a commodity, with few differentiators. Commodity software gets its margins competed away, because if you try to price high, everybody just uses a competitor.

So goes the AI paradox: it's really effective at writing lots and lots of software that is low value and probably never needed to get written anyway. But at least right now (this is changing rapidly), executives are very willing to hire lots of coders to write software that is low value and probably doesn't need to be written, and VCs are willing to fund lots of startups to automate the writing of lots of software that is low value and probably doesn't need to be written.

philipp-gayret 2 hours ago|||
Could you give some examples? I can only imagine completely proprietary technology like trading or developing medicine. I have worked in software for many years and was always paid well for it. None of it was particularly unique in any way. Some of it better than others, but if you could show that there exists software people pay well for that AI cannot make I would be really impressed. With my limited view as software engineer it seems to me that the data in the product / its users is what makes it valuable. For example Google Maps, Twitter, AirBnB or HN.
Toutouxc 1 hour ago|||
All it takes is a sufficiently big pile of custom features interacting. I work on a legal tech product that automates documents. Coincidentally, I'm just wrapping up a rewrite of the "engine" that evaluates how the documents will come out. The rewrite took many months, the code uses graph algorithms and contains a huge amount of both domain knowledge and specific product knowledge.

Claude Code is having the hardest time making sense of it and not breaking everything every step of the way. It always wants to simplify, handwave, "if we just" and "let's just skip if null", it has zero respect for the amount of knowledge and nuance in the product. (Yes, I do have extensive documentation and my prompts are detailed and rarely shorter than 3 paragraphs.)

krethh 2 hours ago||||
You know how whenever you shuffle a deck of cards you almost certainly create an order that has never existed before in the universe?

Most software does something similar. Individual components are pretty simple and well understood, but as you scale your product beyond the simple use cases ("TODO apps"), the interactions between these components create novel challenges. This applies to both functional and non-functional aspects.

So if "cannot make with AI" means "the algorithms involved are so novel that AI literally couldn't write one line of them", then no - there isn't a lot of commercial software like that. But that doesn't mean most software systems aren't novel.

nostrademons 2 hours ago|||
Were you around when any of Google Maps, Twitter, AirBnB, or HN were first released? Aside from AirBnB (whose primary innovation was the business model, and hitting the market right during the global financial crisis when lots of families needed extra cash), they were each architecturally quite different from software that had come before.

Before Google Maps nobody had ever pushed a pure-Javascript AJAX app quite so far; it came out just as AJAX was coined, when user expectations were that any major update to the page required a full page refresh. Indeed, that's exactly what competitor MapQuest did: you had to click the buttons on the compass rose to move the map, it moved one step at a time, and it fully reloaded the page with each move. Google Maps's approach, where you could just drag the map and it loaded the new tiles in the background offscreen, then positioned and cropped everything with Javascript, was revolutionary. Then add that it gained full satellite imagery soon after launch, which people didn't know existed in a consumer app.

Twitter's big innovation was the integration of SMS and a webapp. It was the first microblog, where the idea was that you could post to your publicly-available timeline just by sending an SMS message. This was in the days before Twilio, where there was no easy API for sending these, you had to interface with each carrier directly. It also faced a lot of challenges around the massive fan-out of messages; indeed, the joke was that Twitter was down more than it was up because they were always hitting scaling limits.

HN has (had?) an idiosyncratic architecture where it stores everything in RAM and then checkpoints it out to disk for persistence. No database, no distribution, everything was in one process. It was also written in a custom dialect of Lisp (Arc) that was very macro-heavy. The advantage of this was that it could easily crank out and experiment with new features and new views on the data. The other interesting thing about it was its application of ML to content moderation, and particularly its willingness to kill threads and shadowban users based on purely algorithmic processes.

pjmlp 2 hours ago|||
Agencies have switched to SaaS products and integrations via serverless or low code tooling, exactly because there is already too much of the same.
lmm 3 hours ago|||
> When one prompts an AI to "write me a to-do list app", what they really mean is that "write me a to-do list app that is better that I have imagined so far", which does not really require detailed spec.

If someone was making a serious request for a to-do list app, they presumably want it to do something different from or better than the dozens of to-do list apps that are already out there. Which would require them to somehow explain what that something was, assuming it's even possible.

ms_menardi 3 hours ago|||
It could be an issue of discoverability too. Maybe they just haven't found the to-do app that does what they want, and it's easier to just... make one from scratch.
carlmr 2 hours ago||
Which is not getting better.

I'd pay you 10€ for a TODO app that improved my life meaningfully. It would obviously need to have great UX and be stable. Those are table stakes.

I don't have the time to look at all these apps though. If somebody tells me they made a great TODO app, I'm already mentally filtering them out. There's just too much noise here.

Does your TODO app solve any meaningful problem beyond the bare minimum? Does it solve your procrastination? Does it remind you at the right time?

If it doesn't answer this in the first 2 seconds of your pitch you're out.

pixelbart 1 hour ago|||
Would a musician refrain from writing a love song because there are already better love songs?
lmm 1 hour ago||
> Would a musician refrain from writing a love song because there are already better love songs?

Yes; at least, I would hope a musician who was writing a love song was doing so because they want it to do something different from or better than other existing love songs. (Or they might be doing it to practice their songwriting skills - just as a programmer might write a todo app to practice their programming skills - but it makes no sense to use an AI for that)

smartmic 47 minutes ago|||
I wouldn‘t say this is the core argument of No Silver Bullet. I wrote a short review of Brooks paper with respect to todays AI promises, to whoever is interested in more details:

https://smartmic.bearblog.dev/no-ai-silver-bullet/

Animats 1 hour ago|||
Not entirely.

For some problems, it is. Web front-end development, for example. If you specify what everything has to look like and what it does, that's close to code.

But there are classes of problems where the thing is easy to specify, but hard to do correctly, or fast, or reliably. Much low-level software is like that. Databases, file systems, even operating system kernels. Networking up to the transport layer. Garbage collection. Eventually-consistent systems. Parallel computation getting the same answer as serial computation. Those problems yield, with difficulty, to machine checked formalism.

In those areas, systems where AI components struggle to get code that will pass machine-checked proofs have potential.

ozim 2 hours ago|||
Everyone at least heard stories of people who just want that button 5px to the right or to the left and next meeting they want it in bottom corner - whereas it doesn’t make functionally any difference.

But that’s most of the time is not that they want it from objective technical reasons.

They want it because they want to see if they can push you. They do it „because they can”. They do it because later they can renegotiate or just nag and maybe pay less. Multiple reasons that are not technical.

ChrisMarshallNY 3 hours ago|||
But if you’re selling that to-do list app, then the rules are different, and that spec is required.

I guess it depends on whether or not we want to make money, or otherwise, compete against others.

throwaway27448 1 hour ago|||
In this case a chatbot is also unlikely to succeed in pleasing the user—and how could it?
jamiemallers 4 minutes ago|||
[dead]
quangtrn 1 hour ago||
[dead]
svara 1 hour ago||
The vibe coding maximalist position can be stated in information theory terms: That there exists a decoder that can decode the space of useful programs from a much smaller prompt space.

The compression ratio is the vibe coding gain.

I think that way of phrasing it makes it easier to think about boundaries of vibe coding.

"A class that represents (A) concept, using the (B) data structure and (C) algorithms for methods (D), in programming language (E)."

That's decodeable, at least to a narrow enough distribution.

"A commercially successful team communication app built around the concept of channels, like in IRC."

Without already knowing Slack, that's not decodable.

Thinking about what is missing is very helpful. Obviously, the business strategic positioning, non technical stakeholder inputs, UX design.

But I think it goes beyond that: In sufficiently complex apps, even purely technical "software engineering" decisions are to some degree learnt from experiment.

This also makes it more clear how to use AI coding effectively:

* Prompt in increments of components that can be encoded in a short prompt.

* If possible, add pre-existing information to the prompt (documentation, prior attempts at implementation).

rdevilla 5 hours ago||
I think it's only a matter of time before people start trying to optimize model performance and token usage by creating their own more technical dialect of English (LLMSpeak or something). It will reduce both ambiguity and token usage by using a highly compressed vocabulary, where very precise concepts are packed into single words (monads are just monoids in the category of endofunctors, what's the problem?). Grammatically, expect things like the Oxford comma to emerge that reduce ambiguity and rounds of back-and-forth clarification with the agent.

The uninitiated can continue trying to clumsily refer to the same concepts, but with 100x the tokens, as they lack the same level of precision in their prompting. Anyone wanting to maximize their LLM productivity will start speaking in this unambiguous, highly information-dense dialect that optimizes their token usage and LLM spend...

grey-area 1 hour ago||
Have you just reinvented programming languages and reinforced the author's point?

Setting aside the problem of training, why bother prompting if you’re going to specify things so tightly that it resembles code?

mike_hearn 5 minutes ago|||
Programming languages admit only unambiguous text. What he's proposing is more like EARS, Gherkin or Planguage.
rdevilla 1 minute ago|||
[dead]
majormajor 4 hours ago|||
Unless you're training your own model, wouldn't you have to send this dialect in your context all the time? Since the model is trained on all the human language text of the internet, not on your specialized one? At which point you need to use human language to define it anyway? So perhaps you could express certain things with less ambiguity once you define that, but it seems like your token usage will have to carry around that spec.
nomel 4 hours ago|||
Let's use a non-ambiguous language for this. May I suggest Lojban [1][2]?

[1] https://en.wikipedia.org/wiki/Lojban

[2] Someone speaking it: https://www.youtube.com/watch?v=lxQjwbUiM9w

mike_hearn 3 minutes ago|||
Lojban allows you to speak ambiguously, it just disallows grammatical ambiguity because in the 70s it was hypothesized that NLP understanding was impossible so humans would have to adapt instead of computers. That debate is over; understanding grammar is solved. The new debate is over semantic ambiguity.
dooglius 4 hours ago||||
It looks like that's about syntactic ambiguity, whereas the parent is talking semantic ambiguity
kstenerud 1 hour ago|||
Human language is already very efficient for conveying the ideas we have. Some languages are more efficient at conveying certain concepts, but all are able to handle the 90% case. I'd expect any attempts to build a "technical dialect of English" to go about as well as Esperanto.
nextaccountic 1 hour ago||
We already speak in a "technical dialect of English". All we need is some jargon to talk about technical things. (Lawyers have their own jargon too, also chemists, etc)

Some languages don't have this kind of vocabulary, because there aren't enough speakers that deal with technical things in a given area (and those that do, use another language to communicate)

steve_adams_86 3 hours ago|||
The thing is, doesn't the LLM need to be trained on this dialect, and if the training material we have is mostly ambiguous, how do we disambiguate it for the purpose of training?

In my mind this is solving different problems. We want it to parse out our intent from ambiguous semantics because that's how humans actually think and speak. The ones who think they don't are simply unaware of themselves.

If we create this terse and unambiguous language for LLMs, it seems likely to me that they would benefit most from using it with each other, not with humans. Further, they already kind of do this with programming languages which are, more or less, terse and unambiguous expression engines for working with computers. How would we meaningfully improve on this, with enough training data to do so?

I'm asking sincerely and not rhetorically because I'm under no illusion that I understand this or know any better.

manmal 3 hours ago|||
Codex already has such a language. The specs it’s been writing for me are full of “dedupe”, “catch-up”, and I often need to feedback that it should use more verbose language. Some of that has been creeping into my lingo already. A colleague of mine suddenly says the word “today” all the time, and I suspect that’s because he uses Claude a lot. Today, as in, current state of the code.
vrighter 2 hours ago|||
and then someone will como along and say "wouldn'tt it be nice if this highly specific dialect was standardized?" goto 1
anonzzzies 3 hours ago|||
It was mentioned somewhere else on hn today, but why do I care about token usage? I prompt AI day and night for coding and other stuff via claude code max 200 and mistral; haven't had issues for many months now.
sda2 3 hours ago||
it’s a measure of efficiency. one might not care about tokens until vendors jack up the price and running your own comparable model is infeasible.
otabdeveloper4 4 hours ago|||
> optimizes their token usage and LLM spend

Context pollution is a bigger problem.

E.g., those SKILL.md files that are tens of kilobytes long, as if being exhaustively verbose and rambling will somehow make the LLM smarter. (It won't, it will just dilute the context with irrelevant stuff.)

est 4 hours ago|||
> by creating their own more technical dialect of English

Ah, the Lisp curse. Here we go again.

coincidently, the 80s AI bubble crashed partly because Lisp dialetcts aren't inter-changable.

Dylan16807 4 hours ago|||
Lisp doesn't get to claim all bad accidental programming languages are simply failing to be it, I don't care how cute that one quote is.
reverius42 4 hours ago|||
I bet a modern LLM could inter-change them pretty easily.
est 4 hours ago||
trained on public data, yes.

But some random in-house DSL? Doubt it.

dwd 4 hours ago|||
[dead]
sjeiuhvdiidi 2 hours ago|||
[dead]
capt-obvious 4 hours ago|||
[dead]
noosphr 2 hours ago||
Or they could look at the past few centuries of language theory and start crafting better tokenizers with inductive biases.

We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.

retsibsi 2 hours ago|||
> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language

Can you elaborate? I think you're talking about https://github.com/PastaPastaPasta/llm-chinese-english , but I read those findings as far more nuanced and ambiguous than what you seem to be claiming here.

umanwizard 2 hours ago|||
> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.

Post a link because until you do, I’m almost certain this is pseudoscientific crankery.

Chinese characters are not an “iron age ontology of meaning” nor anything close to that.

Also please cite the specific results in centuries-old “language theory” that you’re referring to. Did Saussure have something to say about LLMs? Or someone even older?

TeeWEE 22 minutes ago||
There are two kid of specs, formal spec, and "Product requirements / technical designs"

Technical design docs are higher level than code, they are impricise but highlight an architectural direction. Blanks need to be filled in. AI Shines here.

Formal specs == code Some language shine in being very close to a formal spec. Yes functional languages.

But lets first discuss which kind of spec we talk about.

amtamt 1 hour ago||
> On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

I guess many of us quality for british parliament.

jumploops 38 minutes ago|
Funnily enough, with the most recent models (having reduced sycophancy), putting in the wrong assumptions often still leads to the right output.
angry_octet 1 hour ago||
A spec is an envelope that contains all programs that comply. Creating this spec is often going to be harder than writing a single compliant program.

Since every invocation of an LLM may create a different program, just like people, we will see that the spec will leave much room for good and bad implementations, and highlight the imprecision in the spec.

Once we start using a particular implementation it often becomes the spec for subsequent versions, because it's interfaces expose surface texture that other programs and people will begin to rely on.

I'm not sure how well LLMs will fare are brownfield software development. There is no longer a clean specification. Regenerating the code from scratch isn't acceptable. You need TPS reports.

causalityltd 3 hours ago||
The cognitive dissonance comes from the tension between the-spec-as-management-artifact vs the-spec-as-engineering-artifact. Author is right that advocates are selling the first but second is the only one which works.

For a manager, the spec exists in order to create a delgation ticket, something you assign to someone and done. But for a builder, it exists as a thinking tool that evolves with the code to sharpen the understanding/thinking.

I also think, that some builders are being fooled into thinking like managers because ease, but they figure it out pretty quickly.

jumploops 1 hour ago||
In my experience with “agentic engineering” the spec docs are often longer than the code itself.

Natural language is imperfect, code is exact.

The goal of specs is largely to maintain desired functionality over many iterations, something that pure code handles poorly.

I’ve tried inline comments, tests, etc. but what works best is waterfall-style design docs that act as a second source of truth to the running code.

Using this approach, I’ve been able to seamlessly iterate on “fully vibecoded” projects, refactor existing codebases, transform repositories from one language to another, etc.

Obviously ymmv, but it feels like we’re back in the 70s-80s in terms of dev flow.

yes_man 1 hour ago||
> In my experience with “agentic engineering” the spec docs should be longer than the code itself. Natural language is imperfect, code is exact.

The latter notion probably is true, but the prior isn’t necessarily true because you can map natural language to strict schemas. ”Implement an interface for TCP in <language>” is probably shorter than the actual implementation in code.

And I understand my example is pedantic, but it extends to any unambiguous definitions. Of course one can argue that TCP spec is not determimistic by nature because natural language isn’t. But that is not very practical. We have to agree to trust some axioms for compilers to work in the first place.

jumploops 39 minutes ago||
Thanks, I updated my comment to say “are often longer” because that’s what I see in practice.

To your point, there are some cases where a short description is sufficient and may have equal or less lines than code (frequently with helper functions utilizing well known packages).

In either case, we’re entering a new era of “compilers” (transpilers?), where they aren’t always correct/performant yet, but the change in tides is clear.

stanac 48 minutes ago||
> The goal of specs is largely to maintain desired functionality over many iterations, something that pure code handles poorly.

IMHO this could be achieved with large set of tests, but the problem is if you prompt an agent to fix tests, you can't be sure it won't "fix the test". Or implement something just to make the test pass without looking at a larger picture.

ACV001 1 hour ago|
I don't agree. The code is much more than the spec. In fact, the typical project code is 90% scaffolding and infrastructure code to put together and in fact contains implementation details specific to the framework you use. And only 10% or less is actual "business logic". The spec doesn't have to deal with language, framework details, so by definition spec is the minimum amount of text necessary to express the business logic and behaviour of the system.
duesabati 6 minutes ago|
With ALL due respect (seriously), this is just a misconception of yours.

When you write software to solve a problem you start with as little as possible details such that when you read it, it would only talk about the business logic. What I mean by that? That you abstract away any other stuff that does not concern the domain you are in, for example your first piece of code should never mention any kind of database technology, it should not contain any reference to a specific communication layer (HTTP for example), and so on.

When you get to this, you have summarized the "spec", and usually it can be read very easily by a non-techincal person (but obviously is a "domain expert") and can also be very testable.

I hope this helps on why the author is right 100%

More comments...