Claude is good at assembling blocks, but still falls apart at creating them

Posted by bblcla 1/14/2026

Claude is good at assembling blocks, but still falls apart at creating them(www.approachwithalacrity.com)

315 points | 237 commentspage 2

Scrapemist 1/15/2026|

Eventually you can show Claude how you solve problems, and explain the thought process behind it. It can apply these learnings but it will encounter new challenges in doing so. It would be nice if Claude could instigate a conversation to go over the issues in depth. Now it wants quick confirmation to plough ahead.

fennecbutt 1/15/2026|

Well I feel like this is because a better system would distill such learning into tokens not associated with a human language and that that could represent logic better than using English etc for it.

I don't have the GPUs or time to experiment though :(

Scrapemist 1/15/2026||

Yes, but I would appreciate it if it uses English to explain its logic to me.

0xbadcafebee 1/16/2026||

I don't think it's possible to make an AI a "Senior Engineer", or even a good engineer, by training it on random crap from the internet. It's got a million brains' worth of code in it. That means bad patterns as well as good. You'd need to remove the bad patterns for it not to "remember" and regurgitate them. I don't think prompts help with this either, it's like putting a band-aid on head trauma.

HarHarVeryFunny 1/16/2026|

It's also rather like trying to learn flintnapping just by looking at examples of knapped flint (maybe some better than others), rather than having access to descriptions of how to do it, and ultimately any practice of doing it.

You could also use cooking as an analogy - trying to learn to cook by looking at pictures of cooked food rather than by having gone to culinary school and learnt the principles of how to actually plan and cook good food.

So, we're trying to train LLMs to code, by giving them "pictures" of code that someone else built, rather than by teaching them the principles involved in creating it, and then having them practice themselves.

Havoc 1/16/2026||

> Claude can’t create good abstractions on its own

LLMs definitely can create abstractions and boundaries. e.g. most will lean towards a pretty clean front end vs backend split even without hints. Or work out a data structure that fits the need. Or splits things into free standing modules. Or structure a plan into phases.

So this really just boils down to „good” abstractions which is subject to model improvement.

I really don’t see a durable moat for us meatbags in this line of reasoning

HarHarVeryFunny 1/16/2026|

There's a difference between "can generate" and "can create [from scratch]". Of course LLMs can generate code that reflects common patterns in the stuff it was trained, such as frontend/backend splits, since this is precisely what they are trained to be able to do.

Coming up with a new design from scratch, designing (or understanding) a high level architecture based on some principled reasoning, rather than cargo cult coding by mimicking common patterns in the training data, is a different matter.

LLMs are getting much better at reasoning/planning (or at least something that looks like it), especially for programming & math, but this is still based on pre-training, mostly RL, and what they learn obviously depends on what they are trained on. If you wanted LLMs to learn principles of software architecture and abstraction/etc, then you would need to train on human (or synthetic) "reasoning traces" of how humans make those decisions, but it seems that currently RL-training for programming is mostly based on artifacts of reasoning (i.e. code), not the reasoning traces themselves that went into designing that code, so this (coding vs design reasoning) is what they learn.

I would guess that companies like Anthropic are trying to address this paucity of "reasoning traces" for program design, perhaps via synthetic data, since this is not something that occurs much in the wild, especially as you move up the scale of complexity from small problems (student assignments, stack overflow advice) to large systems (which are anyways mostly commercial, hence private). You can find a smallish number of open source large projects like gcc, linux, but what is missing are the reasoning traces of how the designers went from requirements to designing these systems the way they did (sometimes in questionable fashion!).

Humans of course learn software architecture in a much different way. As with anything, you can read any number of books, attend any number of lectures, on design principles and software patterns, but developing the skill for yourself requires hands-on personal practice. There is a fundamental difference between memory (of what you read/etc) and acquired skill, both in level of detail and fundamental nature (skills being based on action selection, not just declarative recall).

The way a human senior developer/systems architect acquires the skill of design is by practice, by a career of progressively more complex projects, successes and failures/difficulties, and learning from the process. By learning from your own experience you are of course privy to your own prior "reasoning traces" and will learn which of those lead to good or bad outcomes. Of course learning anything "on the job" requires continual learning, and things like curiosity and autonomy, which LLMs/AI do not yet have.

Yes, us senior meatbags, will eventually be having to compete with, or be displaced by, machines that are the equal of us (which is how I would define AGI), but we're not there yet, and I'd predict it's at least 10-20 years out, not least because it seems most of the AI companies are still LLM-pilled and are trying to cash in on the low-hanging fruit.

Software design and development is a strange endeavor since, as we have learnt, one of the big lessons of LLMs (in general, not just apropos coding), is how much of what we do is (trigger alert) parroting to one degree or another, rather than green field reasoning and exploration. At the same time, software development, as one gets away from boilerplate solutions to larger custom systems, is probably one of the more complex and reasoning-intensive things that humans do, and therefore may end up being one of the last, rather than first, to completely fall to AI. It may well be AI managers, not humans, who finally say that at last AI has reached human parity at software design, able to design systems of arbitrary complexity based on principled reasoning and accumulated experience.

Havoc 1/16/2026||

Certainly there is space for designs a LLM can't come up with, but lets be real senior developers are not cranking out never seen before novel architectures routinely any more than physicists are coming up with never thought of theories that work weekly.

It's largely the same patterns & principles applied in a tailored manner to the problem at hand, which LLMs can...with mixed success...do.

>human parity

Impact is not felt when the hardest part of the problem is cracked, but rather the easy parts.

If you have 100 humans making widgets and the AI can do 75% of the task then you've suddenly got 4 humans competing for every 1 remaining widget job. This is going to be lord of the flies in job market long before human parity.

>I'd predict it's at least 10-20 years out

For AGI probably, but I think by 2030 this will have hit society like a freight train...and hopefully we've figured out what we'll want to do about it by then too. UBI or whatever...because we'll have to.

HarHarVeryFunny 1/17/2026||

> Certainly there is space for designs a LLM can't come up with, but lets be real senior developers are not cranking out never seen before novel architectures routinely any more than physicists are coming up with never thought of theories that work weekly.

True, but I didn't mean to focus on creativity, just the nature of what can be learned when all you have to learn from is artifacts of reasoning (code), not the underlying reasoning traces themselves (reasoning process for why the code was designed that way). Without reasoning traces you get what we have today where AI programming in the large comes down to cargo cult code pattern copying, without understanding whether the (unknown) design process that lead to the patterns being copied reasonably apply to the requirements/situation at hand.

So, it's not about novelty, but rather about having the reasoning traces (for large structured projects) available to learn when to apply design patterns that are already present in the training data - to select design patterns based on a semblance of principled reasoning (RL for reasoning traces), rather than based on cargo cult code smell.

> This is going to be lord of the flies in job market long before human parity.

Perhaps, and that may already be starting, but I think that until we get much closer to AGI you'll still need a human in the loop (both to interact with the AI, and to interact with the team/boss), with AI as a tool not a human replacement. So, the number of jobs may not decrease much, if at all. It's also possible that Jevons paradox applies and that the number of developer jobs actually increases.

It's also possible that human-replacement AGI is harder to achieve than widely thought. For example, maybe things like emotional intelligence and theory of mind are difficult to get right, and without it AI never quite cuts it as an fully autonomous entity that people want to deal with.

> UBI or whatever...because we'll have to.

Soylent Green ?

Havoc 1/17/2026||

re reasoning traces - not sure frankly. I get what you're saying in that there is only so much advanced thinking you can learn from just scraping github code, and it certainly seems to be the latest craze in getting a couple extra % on benchmarks but I'm not entirely convinced it is necessary per se. Feels like an human-emulation crutch to me rather than a necessary ingredient to machines performing a task well.

For example I could see some sort of self-play style RL working. Which architecture? Try them all in a sandbox and see. Humans need to trial & error learning as you say. So why not here too? Seems to have worked for alphago which arguably also contains components of abstract high level strategy.

>Jevons paradox

I can see it for tokens and possibly software too, but rather skeptical of it in job market context. It doesn't seem to have happened for the knowledge work AI already killed (e.g. translation or say copy writing). More (slop) stuff is being produced but it didn't translate into a hiring frenzy of copy writers. Possible that SWE is somehow different via network effects or something but I've not heard a strong argument for it yet.

>It's also possible that human-replacement AGI is harder to achieve than widely thought.

Yeah I think the current paradigm isn't gonna get us there at all. Even if you 10x GPT5 it still seems to miss some sort of spark that a 5 year old has but GPT doesn't. It can do PHD level work but qualitatively there is something missing there about that "intelligence".

Interesting times ahead for better or worse

iamacyborg 1/15/2026||

Here’s an example of a plan I’m working on in CC, it’s very thorough, albeit required a lot of handholding and fact checking on a number of points as it’s first few passes didn’t properly anonymise data.

https://docs.google.com/document/u/0/d/1zo_VkQGQSuBHCP45DfO7...

machiaweliczny 1/16/2026||

Yeah, that's my current gripe but I think this just needs some good examples in AGENTS.md (I've done some for hooks and it kinda works but need to remind it). I need good AGENTS.md that explain what good abstraction boundary is and how to define is the problem is I am not sure I know how to put it into words, if anyone has idea please let me know.

EGreg 1/16/2026||

This is exactly what we found out a year ago for all AI builders. But what is the best way to convince early investors of this thesis? They seem to be all-in on just building everything from scratch end-to-end. Here is what we built:

https://engageusers.ai/ecosystem.pdf

malka1986 1/16/2026||

I am making an app in Elixir.

100% of code is made by Claude.

It is damn good at making "blocks".

However, Elixir seems to be a langage that works very well for LLM, cf. https://elixirforum.com/t/llm-coding-benchmark-by-language/7...

hebejebelus 1/16/2026|

Hmm, that benchmark seems a little flawed (as pointed out in the paper). Seems like it may give easier problems for "low-resource" languages such as Elixir and Racket and so forth since their difficulty filter couldn't solve harder problems in the first place. FTA:

> Section 3.3:

> Besides, since we use the moderately capable DeepSeek-Coder-V2-Lite to filter simple problems, the Pass@1 scores of top models on popular languages are relatively low. However, these models perform significantly better on low-resource languages. This indicates that the performance gap between models of different sizes is more pronounced on low-resource languages, likely because DeepSeek-Coder-V2-Lite struggles to filter out simple problems in these scenarios due to its limited capability in handling low-resource languages.

It's also now a little bit old, as with every AI paper the second they are published, so I'd be curious to see a newer version.

But, I would agree in general that Elixir makes a lot of sense for agent-driven development. Hot code reloading and "let it crash" are useful traits in that regard, I think

joduplessis 1/16/2026||

Recently I've put Claude/others to use in some agentic workflows with easy menial/repetitive tasks. I just don't understand how people are using these agents in production. The automation is absolutely great, but it requires an insane amount of hand-holding and cleanup.

baq 1/16/2026||

Automate hand holding and cleanup obviously. (Also known as ‘harness’.)

iamleppert 1/16/2026|

I use Claude daily and I 100% disagree with the author. The article reeks of someone who doesn't understand how to manage context appropriately or describe their requirements, or know how to build up a task iteratively with a coding agent. If you have certain requirements or want things done in a certain way, you need to be explicit and the order of operations you do things in matters a lot in how efficient it completes the task, and the quality of the final output. It's very good at doing the least amount of work to just make something work by default, but that's not always what you want. Sometimes it is. I'd much rather prefer that as the default mode of operation than something that makes a project out of every little change.

The developers who aren't figuring out how to leverage AI tools and make them work for them are going to get left behind very quickly. Unless you're in the top tier of engineers, I'm not sure how one can blame the tools at this point.

More comments...