- -dangerously-skip-reading-code

Posted by fagnerbrack 10 hours ago

- -dangerously-skip-reading-code(olano.dev)

44 points | 55 commentspage 2

debesyla 4 hours ago|

I found that adding "philosophy" descriptions help guide the tooling. No specs, just general vibes what's the point, because we can't make everyone happy and it's not a goal of a good tool (I believe).

Technology, implementation may change, but general point of "why!?" stays.

hombre_fatal 3 hours ago||

Yeah, this has been my process for months now.

I might even start my own blog to write about things I've found.

1. Always get the agent to create a plan file (spec). Whatever prompt you were going to yolo into the agent, do it in Plan Mode first so it creates a plan file.

2. Get agents to iterate on the plan file until it's complete and thorough. You want some sort of "/review-plan <file>" skill. You extend it over time so that the review output is better and better. For example, every finding should come with a recommended fix.

3. Once the plan is final, have an agent implement it.

4. Check the plan in with the impl commit.

The plan is the unit of work really since it encodes intent. Impl derives from it, and bugs then become a desync from intent or intent that was omitted. It's a nicer plane to work at.

From this extends more things: PRs should be plan files, not code. Impl is trivial. The hard part is the plan. The old way of deriving intent from code sucked. Why even PR code when we haven't agreed on a plan/intent?

This process also makes me think about how code implementation is just a more specific specification about what the computer should do. A plan is a higher level specification. A one-line prompt into an LLM is the highest level specification. It's kinda weird to think about.

Finally, this is why I don't have to read code anymore. Over time, my human review of the code unearthed fewer and fewer issues and corrections to the point where it felt unnecessary. I only read code these days so I can impose my preferences on it and get a feel for the system, but one day you realize that you can accumulate your preferences (like, use TDD and sum types) in your static prompt/instructions. And you're back to watching this thing write amazing code, often better than what you would have written unless you have maximum time + attention + energy + focus no matter how uninteresting the task, which you don't.

wizzwizz4 6 hours ago||

> There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec.

As I understand, this is an unsolved problem.

InsideOutSanta 4 hours ago||

Step 1: solve the halting problem.

soraminazuki 4 hours ago|||

Yep, calling it an "unsolved" problem is a misnomer. We already have mathematical proof that it's impossible.

But that aside, it's such a shame that many drinking the AI Kool-Aid aren't even aware of the theoretical limits of a computer's capabilities.

mjd 3 hours ago||

This sort of theoretical result is not always as clear-cut as you suggest.

Computers are finite machines. There is a theorem that although a machine with finite memory can add, multiplication requires unbounded memory. Somehow we muddle along and use computers for multiplication anyway.

More to your point there is a whole field of people who write useful programs using languages in which every program must be accompanied by a proof that it halts on all inputs.

(See for example https://lean-lang.org/ or David Turner's work on Total Functional Programming from about 20 years ago.)

Other examples are easy to find. The simplex algorithm for linear optimization requires exponential time in general, and the problem it solves is NP-hard, but in practice works well on problems of interest and is widely used. Or consider the dynamic programming algorithms for problems like subset-sum.

Theory is important, but engineering is also important.

za_creature 2 hours ago|||

> There is a theorem that (...) multiplication requires unbounded memory

What theorem is that?

The multiplication of any two integers below a certain size (called "words") fits in a "double word" and the naive multiplication algorithm needs to store the inputs, an accumulator and at most another temporary for a grand total of 6*word_size

Sure, you can technically "stream" carry-addition (which is obvious from the way adders are chained in ALU-101) and thus in a strict sense addition is O(1) memory but towards your final point:

> Theory is important, but engineering is also important.

In practice, addition requires unbounded memory as well (the inputs). And it's definitely compute-unbounded, if your inputs are unbounded.

I dislike the term "we muddle along". IEEE 754 has well specified error bars and cases, and so does all good data science. LLMs do not, or at least they do not expose them to the end user

So then, how exactly do we go about proving that the result of chaining prompts is within a controllable margin of error of the intended result? Because despite all the specs, numerical stability is the reason people don't write their own LAPACK.

soraminazuki 3 hours ago|||

But it's not like these systems make theory go away, they make compromises. So the question is, what's the compromise required for an algorithm that can check the conformance of computer programs to natural language specifications that doesn't involve hoping for the best?

wizzwizz4 33 minutes ago||

Natural language specifications often aren't specifications at all: interpreting them requires context that is not available to the computer, and often not even available to the specification's authors without further research / decision-making.

LLMs address this problem by just making things up (and they don't do a great job of comprehending the natural language, either), which I think qualifies as "hoping for the best", but I'm not sure there is another way, unless you reframe the problem to allow the algorithm to request the information it's missing.

wizzwizz4 4 hours ago|||

You're probably thinking of Rice's theorem (a sort of generalised halting problem), but this task is actually way easier than that, since we're not trying to study arbitrary algorithms: we're trying to study the subset of human-comprehensible algorithms. Most of the things we want computers to do are things that, given enough time, someone can write a program to solve: and generally, those programs are not tricksy or meta, and don't involve unsolved mathematics problems found by studying busy-beaver candidates. If it's possible for a human to understand how a program works (which is a necessary component of writing such a program), it's possible to write a mathematical proof about the program's behaviour, and that means it's in principle possible to automate the construction of that mathematical proof, which is equivalent to the "determine whether code conforms to the spec" task.

"Somewhat easier than an impossible task" is not a particularly strong claim about when (or whether) this problem will be solved, though.

soraminazuki 3 hours ago|||

We can't create an algorithm determining whether a computer program halts or not, but we can write one that checks whether it conforms to natural language specifications much more easily? That makes no sense. There's no exception to the halting problem regarding "human comprehensible" computer programs.

__s 1 hour ago|||

They're saying most useful programs don't fall in the complete / correct divide. You can get a lot done while restricting yourself to provable programs

wizzwizz4 1 hour ago|||

Rice's theorem says that we can't draw a partition between two sets of programs, based on their semantic properties. It says nothing about drawing a partition slightly to one side of the desirable partition, misclassifying some tricksy cases, but correctly classifying all the programs we care about.

ElectricalUnion 2 hours ago|||

I will give you a class of programs humans wrote and they want improved: LLMs.

Those were written by humans, and don't involve unsolved mathematics.

Is your claim tht you just need to solve comprehensibility of LLMs?

Figuring out epistemology and cognition to have a chance to reason about the outputs of a LLM seems to me way harder that traditional attempts to reason directly about algorithms.

Ecys 6 hours ago||

this is actually precisely what humans' roles will be.

"is this implementation/code actually aligned with what i want to do?"

humanic responsibility's focus will move entirely from implementing code to deciding whether it should be implemented or not.

u probably mean unsolved as in "not yet able to be automated", and that's true.

if pull-request checks verifying that tests are conforming to the spec are automated, then we'd have AGI.

lefra 3 hours ago|||

That's what formal verification is about. I did some (using PSL for hardware verification); writing the formal spec is way harder than the actual code. It will find a lot of subtle issues, and you spend a most of the time deciding if it's the spec or the code that's wrong.

Having the code-writing part automated would have a negligible impact on the total project time.

phainopepla2 4 hours ago||||

> humanic

No, thank you

wizzwizz4 6 hours ago|||

This is a task that humans are exceptionally bad at, because we are not computers. If something uses the right words in the right order such that it communicates the correct algorithm to a human, then a human is likely to say "yup, that's correct", even if an hour's study of these 15 lines reveals that a subtle punctuation choice, or a subtle mismatch between a function's name and its semantics, would reveal that it implements a different algorithm to the expected one.

LLMs do not understand prose or code in the same way humans do (such that "understand" is misleading terminology), but they understand them in a way that's way closer to fuzzy natural language interpretation than pedantic programming language interpretation. (An LLM will be confused if you rename all the variables: a compiler won't even notice.)

So we've built a machine that makes the kinds of mistakes that humans struggle to spot, used RLHF to optimise it for persuasiveness, and now we're expecting humans to do a good job reviewing its output. And, per Kernighan's law:

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?

And that's the ideal situation where you're the one who's written it: reading other people's code is generally harder than reading your own. So how do you expect to fare when you're reading nobody's code at all?

Ecys 5 hours ago||

i meant on a higher, agentic level where the AI's code is infallible. and that's going to happen very soon:

say: human wants to make a search engine that money for them.

1. for a task, ask several agents to make their own implementation and a super agent to evaluate each one and interrogate each agent and find the best implementation/variable names, and then explain to the human what exactly it does. or just mythos

2. the feature is something like "let videos be in search results, along with links"

3. human's job "is it worth putting videos in this search engine? will it really drive profits higher? i guess people will stay on teh search engine longer, but hmmm maybe not. maybe let's do some a/b testing and see whether it's worth implementing???" etc...

this is where the developer has to start thinking like a product manager. meaning his position is abolished and the product manager can do the "coding" part directly.

now this should be basic knowledge in 2026. i am just reading and writing back the same thing on HN omds.

wizzwizz4 4 hours ago||

The AI's code is not going to be infallible any time soon. It's been "very soon" for the past 4 years, and the AI systems are still making the same kinds of mistakes, which are the mistakes you'd expect from a first-principles study of their model architectures. There's no straightforward path to modifying the systems we have now, to make them infallible.

testplzignore 3 hours ago||

> Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules.

LOL. I had to check if this was published on April 1st.

farmerbb 1 hour ago||

I legit can't tell if this article is satire, or not.

lesscode 3 hours ago||

Instead of accepting 20,000 lines of slop per PR (and never-ending combinatorial complexity), maybe we should aim to think about abstractions and how to steer LLMs to generate code similar to that of a skilled human developer. Then it could actually be a maintainable artifact by humans and LLMs alike.

crnkofe 3 minutes ago|

I don't get why every AI article is so hyper-focused on coding speed. If the coding is so fast doesn't it make sense to invest more time into quality, learning, documentation, testing refactoring, making a better product? I'm beginning to think that the slopcoders are evaluated by kLOCs of lines written in addition to LLM token usage and they're just maximising the measured metrics. Whether that actually ends up in production or is used by any real person is seemingly irrelevant. Likely the more bugs that are produced the more agents can be spun in parallel to simulate busywork.

Ecys 6 hours ago||

very true. and we already know and agree with this.

user experience/what the app actually does >>> actually implementing it.

elon musk said this a looong time ago. we move from layer 1 (coding, how do we implement this?) to layer 2 thinking (what should the code do? what do we code? should we implement this? (what to code to get the most money?))

this is basic knowledge

duskdozer 4 hours ago|

Elon Musk has been saying Teslas would have fully autonomous self-driving within 1-3 years since 2013

vinnymac 2 hours ago||

I left a similar comment elsewhere in this thread. I still remember when so many people hallucinated that we would suddenly have flying cars by 2002 at the latest. If we achieve several more major improvements on current technology, these thoughts are interesting to consider. But not before that occurs.

We need the pragmatic engineer more than ever.

fijiol 3 hours ago|

[flagged]