Top
Best
New

Posted by ingve 9 hours ago

The Coming Loop(lucumr.pocoo.org)
241 points | 190 comments
weego 2 hours ago|
What does any of that mean in practice? it's just rambling about abstract concepts that seem to be designed to hint at a bigger picture, when it's just getting AI to write code for you.

Is this where it's going? Having to mystify our roles so it seems like we're still the thought leaders when actually we're just becoming pseudo-teachers that try and herd our group of AI idiots to the right conclusion for us so we don't have to, without ever giving away that it's just all techno-babble?

coldtea 1 hour ago||
Once you buy into the AI hype, you babble like that. Yegge is an even worse example.
fantasizr 1 hour ago|||
tech blogs used to read like actionable readme guides. I couldn't finish it without thinking: what am I supposed to do with this information? The shelf life of the latest and greatest is about 2 weeks in the AI space. I never caught up to the ralph wiggum loop and now I'm glad I never tried.

https://news.ycombinator.com/item?id=46682325

dofm 1 hour ago|||
From my past experience of religion at various levels I am very often reminded of borderline-cult religious meetings, and the zeal of converts repeating gnomic oversimplifications, and of how exhausting it was to try to engage with them on any topic of substance.

My own feeling is that it is totally OK to simply route around these people.

It's fascinating how many of the "keep your identity small" folks in the YC/HN sphere have lost any sense of perspective at the first sign of a technology that wanders into the philosophical realm. AI-oriented identities are everywhere.

f311a 36 minutes ago|||
> What does any of that mean in practice?

They want you to spend more tokens

AndrewKemendo 6 minutes ago|||
I’m sorry but this response is just absolutely ridiculous and is not giving anything near the respect to the author that you should have.

You’re just rambling and ranting about philosophical things and have basically nothing to say about the technical or engineering points that the author wrote.

This is a entirely emotional appeal and doesn’t actually engage the author where the author is engaging in the audience.

If you look further down thread there’s dozens of comments that are engaging with the content and not being hyperbolic about all this cyber shamanism or whatever you wanna call it

wahnfrieden 1 hour ago|||
Why is HN interested in management and team process discussion but allergic to similar topics on how to manage agents?

It’s like saying why discuss these team workflows when it’s just devs writing code. Or why use any jargon to describe workflows when it's just devs writing code.

SimianSci 2 hours ago||
When someone is expected to be wizened and does not have the knowledge to keep up with the needs of those around them, they in turn become Shamanistic in their practice.

The speed of improvement on these models has been incredible and has outpaced the learning speed of humans and put many experts into these Shamanistic roles.

I think the operative means of addressing this is to recognize that we can only learn so quickly, but we are still called to improve our knowledge and understanding to a higher level. Since the improvement of these models is neither logorithmic, nor exponential, we currently occupy a space in time in which the models are currently smarter on average than we are as a collective whole.

watutalkinbout 1 hour ago|||
Algorithms and data that emulate responses aren't smart.

A 5 year old knows if you want to wash your car, you need to take it to the car wash.

coldtea 1 hour ago||
Can a 5 year old write a substantial program on spec, that passes the requirements and given tests, in a few minutes?

If not, then perhaps this comparison is not the be all end all.

"A ship is useless, it can't drive over land..."

JBits 55 minutes ago|||
But it demonstrates that LLMs struggle with basic reasoning. A criticism of LLMs is that they're imitating without a understanding of what they're doing and without a clear plan, so this inability to solve a simple logic puzzle is very relevant. If LLMs didn't struggle with reasoning problems then something like ARC-AGI wouldn't exist.
dominotw 1 hour ago|||
5 year olds and ai both have jagged intelligence.

also its AI not "artificial code generation intelligence" . Ship is your view of the product to shoehorn into something specific.

jdjdkdkslem 1 hour ago|||
[dead]
mccoyb 8 hours ago||
Loops work when you spend the proper amount of time to understand what you want ahead of time. The prerequisite is clarity — enough clarity that you could write a careful specification that you could hand off to a junior colleague.

Often, it takes 5-6 broken crappy versions of a thing until you understand that. There is no accelerating the 5-6 broken crappy versions - there’s no agent tech that’s going to help your meat brain avoid thinking time.

So most of my time is iterating between these two phases: I don’t understand what I want, I need to read and write and play with code, okay it’s been long enough I think I know what I want (it is extremely easy to deceive yourself) … okay now I do actually know what I want and I can write a loop.

Many people think they can jump ahead with agents. You cannot fake understanding or clarity. It is painfully obviously when someone skipped that meat brain understanding phase.

athrowaway3z 7 hours ago|
I had codex write a tool to extract all my pi sessions. (Had to filter out my prompts from the agents talking to subagents).

Then I had it analyze the patterns i was making and turned that into the flowchart for the outer guidance-creating-prompt.

I didn't have to spend too much time thinking what i wanted. I wanted it to do that.

The result is still mixed, and i'm not trusting it with delicate code bases, but for a game i've been building i dropped my check-in time to 1/5th i was previously spending on it.

Thats not a good thing per-se. I'm sure i'm missing good ideas by _not_ spending time with it. But previously I really had stagnated with my prompts becoming mechanical #now-do-this and #now-review-that with 90% of its suggestions being correct.

Just need to (automatically) remind it to "do the hard stuff first, clean up & refactor as you go" as well as a "reflect on your work" after its first return to get it to spill the beans on any crap left behind, and then process that in the guidance-creating-prompt to dish out new work.

galaxyLogic 41 minutes ago||
I think what is going to happen is revival of "Methodology".

"Methodology" was a big thing in the past just before we got into "Agile Extreme Coding", instead of trying to model the big picture of SW development projects just jump into coding agilly. Implement it feature-by-feature

Granted the methdologies proposed ( See: https://www.ibm.com/docs/en/rational-soft-arch/9.7.0?topic=m... ) may have been too heavy and not flexible and not improved enough. But now with the rise of Agents I think we need to revise and perhaps re-invent them for AI agentic development.

mmillin 7 hours ago||
>Yet even with a lot of manual steering, that type of code does not come out of LLMs naturally, and even if the code comes out naturally like that, they will still attempt to handle now impossible errors.

This is something I’ve struggled to fight against in many PR reviews. Especially once already written, convincing someone that their excessive null checking is harmful is an uphill battle. Short of better modeling (and languages that allow for sum types to enable it), I haven’t been able to come up with a universally convincing argument against this kind of “shotgun parsing.”

Maybe it really just isn’t that big of a deal? But when actually reading through and refactoring a codebase I’ve always found it frustrating to manage these unnecessary checks. Sometimes they’re nearly impossible to delete safely once present without first adding some kind of logging or broad investigation.

handoflixue 6 hours ago||
How impossible are we talking?

I tend to be a fairly defensive programmer - maybe nothing currently sends this function a negative value, but how hard is it for a future code change to alter that assumption? I always figured a clear error was best. It lets even someone unfamiliar with the code know what assumptions are being made about the valid range of inputs, so they don't have to consider impossible outliers.

mmillin 5 hours ago||
Obviously it isn’t totally impossible, but it becomes challenging to know if it’s required or not. It’s hardest when it isn’t just throwing an error but instead defaulting to something only half-sensible. For example replacing a negative number with 0 or overflowing rather than panicking.

When it comes to assumptions about the input, ideally model them in the type system. If you can’t, explicit checks and throws are OK in my book. But don’t check-and-hide any errors. You’ll be hard pressed to debug the issues they’ll cause down the road, since it will usually be far from the implementation that you see the impact.

datadrivenangel 7 hours ago||
And AI code reviews encourage overly delusional defensive paranoia. triple null checking deep inside a function is technically a real risk, but in practice should never be hit because you've checked for nulls in every function that calls or could call the function in question and is thus not necessarily worth guarding against.
Multicomp 6 hours ago||
Code is part of a shared and built understanding of an information system.

If these loopers mean we all have to move at this continuous wave of software happening, then we get to the highest levels of logical information system design and its all human judgement and balancing of business requirements to fit a given niche in a company or market. So all the programmers have to become business analysts/market researchers/businessmen...except the specific niches where AI tooling can't really clank well...or the end of the subsidized AI token era makes all this looping too expensive to continue. This feels like expert systems and symbolics lisps machines redux, where we briefly ran into the fact that its not so much the code itself not being able to do stuff, it's that your company's org always gets shipped, so if you can't change your company org, your software only has so much flexibility.

Dataflow diagrams and domain knowledge / domain modeling / ubiquitous languages may become the metalanguage that we start to use and set the standards for quality, functional, and non-functional standards and conventions. We make the "looper clankers" ensure that they fulfill that data / behavior / performance contracts before saying what "done" is, because "done" is no longer just code that compiles, code that builds, code that deploys, or even code that sits in production; it's code that fulfills all of the user requirements, operator requirements, and maintainer requirements. So, the language used may be required to make us all turn into business analysts and software architects more than syntax knowers. The revenge of UML and the return of declarative / logical design / BDD triumphing?

(Typo scan by gemma4-12b but I didn't let it alter my message)

lifeisstillgood 1 hour ago||
>>> For now I have not moved past the point of comprehension being important to me.

I see software as new form of literacy, even in the AI world, so yeah in my world view, comprehension will be something we always cling to.

I might comprehend some code the way I comprehend the newspaper article on the second page, others I comprehend like a Dylan Thomas poem. My attention might be different but I still need to understand it.

wseqyrku 2 hours ago||
For some reason pro-ai blog posts feel like paid ads, I might be wrong.
tfrancisl 2 hours ago||
I can't blame you when the first few sentences almost always evoke one of the "creators of XYZ" (I don't know how you can say a model or model harness has a singular creator when the model was trained on everyone's data and the harness was built by a whole team?) and treats their word or experience as gospel.

Who cares what Cherny thinks? He is selling his product, and he will probably cash out soon enough while his credibility is as high as it is.

shikshake 2 hours ago|||
To me there’s usually an undercurrent of manic zeal that makes them feel that way.
rolisz 2 hours ago||
Armin is very nuanced and balanced. He spells out clearly in the blog post that bad parts of AI
boscillator 7 hours ago||
> the right fix is not "handle every malformed case." ... [LLMs] will still attempt to handle now impossible errors.

This is the number one code smell from LLMs and I don't know why they are so obsessed with it. In python, it often comes as `hasattr` checks on types that are defined to have that attribute, in a code base that is fully type-checked.

Why do they do that? Is it from pre-training or re-enforcement? If that latter, can the labs please fix this?

rzmmm 7 hours ago||
Likely just that they err on the unnecessary error handling than missing error handling. They likely penalize runtime errors harshly in the training
ambicapter 3 hours ago|||
Because the vast majority of the codebases in its training set aren't fully type-checked, or very clean at all. Or it's just snippets from Stack Overflow, so there's no existing context to not assume null-checking is valid.
jerf 7 hours ago|||
I suspect it's mostly the training data. I am also on team "make illegal states unrepresentable". It may get talked about a lot on HN, but I'm still at the point that I'm surprised when I see a code base that I didn't write in the wild that does a really good job of it, either open source or at work. Most programmers still think in terms of picking up pieces and fixing errors at the point where the error message pops out rather than making it so the error can't happen and the data reflects that.

I say "mostly" because I think there's also a problem with AIs thinking this way in their current state. That last level of human understanding of a code base, where the human holistically understands the flow of those guarantees, is a challenge to give them right now. On the raw code level, this sort of thing often involves enough code to easily blow out their context window. Trying to summarize it in memories-style files has its own problems; just because there is text written down about the guarantees doesn't mean that the AI is going to get the right info out of it, any more than a human might from just reading the code. I won't say it's "impossible" to give an AI this understanding because I'm not sure it is, but it is a level of understanding of the code that even if you get them to have it, their practices tend to fight against it.

My own solution to this problem has largely been to give up on them getting this. I prompt a solution to the problem the way that most people do, then if I want to make bad illegal states unrepresentable I prompt the AI through the process of the necessary refactorings, unless it's so small that I just do it myself. Given a lot of code that uses maps/dicts and arrays and strings and ints, if you prompt it through making those more thoroughly typed, it's actually pretty good at it. I've not had a lot of luck getting good designs out of single prompts, even when I get detailed. Treating it as two separate tasks seems to work out well.

And watch the diffs on the types carefully; AI loves to sneak past a ".JustSetItAndIgnoreAllThePreAndPostConditions(string)" method. After all, I suspect there's plenty of training data of "types that are nicely structured to make error states unrepresentable and then a later maintainer came along and added a 'JustEffingDoIt' method that broke everything" in the field. One of the best defenses is to make sure that the type implementing these things is in its own file and you can easily look at all the methods it adds on those types and smack it when it does that. I've tried slathering warnings about not doing this and explaining the pre- and post-conditions being maintained in the docs but the change seems marginal.

efromvt 2 hours ago|||
million times this - getattr on every dataclass is a wild choice
CuriouslyC 6 hours ago|||
Sorry to say but the solution is to stop using python. The models are trained to code defensively assuming historically representative python codebases. The models trust the types a lot more in languages where the canonical historical examples trust the types because the language is constructed around that premise.
zahlman 21 minutes ago||
I would expect a language model to do a better job of coping with that kind of uncertainty, inferring type from name and usage, etc.
skywhopper 7 hours ago||
It’s because it matches the patterns they are trained to follow. They don’t understand the code. They can’t reason about the actual logic flow. They can only work with patterns.
CraigJPerry 8 hours ago||
> My current status is that I have not had much success with this way of working for code I deeply care about

If something is judgement heavy, "code i care deeply about", then i don't really agree with the direction of travel here. Don't try to delegate decisions you care deeply about.

I do like the framing of agent loop vs harness loop, but only delegate stuff that you can accurately specify in advance, that usually means stuff that's repeatable in my case ("hey go see how i did X, do that but for Y"), and that inherently means stuff that's predictable.

For stuff where lack of my judgement as input is just going to cause me to say "no", we're down to collaborating in the "agent loop" as Armin puts it. And that's totally fine. It's fast, but also safe.

Remember before AI coding assistants, sometimes you'd get an engineer join your team who was SUPER productive, your peers would be jealous "oh yeah but you guys only got all that done because you have X on your team!" - they didn't live the curse of having that kind of person around - if you don't have them PERFECTLY aligned, then they run off at break neck speed in the wrong direction.

zahlman 13 minutes ago|
> Don't try to delegate decisions you care deeply about.

> they didn't live the curse of having that kind of person around - if you don't have them PERFECTLY aligned, then they run off at break neck speed in the wrong direction.

Exactly. If you wouldn't outsource it to people you considered highly skilled, why would you outsource it to a machine?

tmshapland 1 hour ago|
Thank you for writing this thoughtful post, Armin. I find it deeply comforting that the developer of Pi, an agent harness, does not remove himself from the loop, like me. Maybe if I started thinking of codebases as biological organisms I could get comfortable with getting the human out of the loop.
More comments...