Top
Best
New

Posted by bblcla 2 days ago

Claude is good at assembling blocks, but still falls apart at creating them(www.approachwithalacrity.com)
279 points | 204 commentspage 2
malka1986 4 hours ago|
I am making an app in Elixir.

100% of code is made by Claude.

It is damn good at making "blocks".

However, Elixir seems to be a langage that works very well for LLM, cf. https://elixirforum.com/t/llm-coding-benchmark-by-language/7...

hebejebelus 3 hours ago|
Hmm, that benchmark seems a little flawed (as pointed out in the paper). Seems like it may give easier problems for "low-resource" languages such as Elixir and Racket and so forth since their difficulty filter couldn't solve harder problems in the first place. FTA:

> Section 3.3:

> Besides, since we use the moderately capable DeepSeek-Coder-V2-Lite to filter simple problems, the Pass@1 scores of top models on popular languages are relatively low. However, these models perform significantly better on low-resource languages. This indicates that the performance gap between models of different sizes is more pronounced on low-resource languages, likely because DeepSeek-Coder-V2-Lite struggles to filter out simple problems in these scenarios due to its limited capability in handling low-resource languages.

It's also now a little bit old, as with every AI paper the second they are published, so I'd be curious to see a newer version.

But, I would agree in general that Elixir makes a lot of sense for agent-driven development. Hot code reloading and "let it crash" are useful traits in that regard, I think

michalsustr 21 hours ago||
This article resonates exactly how I think about it as well. For example, at minfx.ai (a Neptune/wandb alternative), we cache time series that can contain millions of floats for fast access. Any engineer worth their title would never make a copy of these and would pass around pointers for access. Opus, when stuck in a place where passing the pointer was a bit more difficult (due to async and Rust lifetimes), would just make the copy, rather than rearchitect or at least stop and notify user. Many such examples of ‘lazy’ and thus bad design.
alphazard 18 hours ago||
This sounds suspiciously like the average developer, which is what the transformer models have been trained to emulate.

Designing good APIs is hard, being good at it is rare. That's why most APIs suck, and all of us have a negative prior about calling out to an API or adding a dependency on a new one. It takes a strong theory of mind, a resistance to the curse of knowledge, and experience working on both sides of the boundary, to make a good API. It's no surprise that Claude isn't good at it, most humans aren't either.

anshumankmr 7 hours ago||
IDK its been pretty solid (but it does mess up) which is where I come in. But it has helped me work with Databricks (read/writing from it) and train a model using it for some of our customers, though its NOT in prod.
0xbadcafebee 13 hours ago||
I don't think it's possible to make an AI a "Senior Engineer", or even a good engineer, by training it on random crap from the internet. It's got a million brains' worth of code in it. That means bad patterns as well as good. You'd need to remove the bad patterns for it not to "remember" and regurgitate them. I don't think prompts help with this either, it's like putting a band-aid on head trauma.
HarHarVeryFunny 1 hour ago|
It's also rather like trying to learn flintnapping just by looking at examples of knapped flint (maybe some better than others), rather than having access to descriptions of how to do it, and ultimately any practice of doing it.

You could also use cooking as an analogy - trying to learn to cook by looking at pictures of cooked food rather than by having gone to culinary school and learnt the principles of how to actually plan and cook good food.

So, we're trying to train LLMs to code, by giving them "pictures" of code that someone else built, rather than by teaching them the principles involved in creating it, and then having them practice themselves.

joshcsimmons 19 hours ago||
IDK I've been using opus 4.5 to create a UI library and it's been doing pretty well: https://simsies.xyz/ (still early days)

Granted it was building ontop of tailwind (shifting over to radix after the layoff news). Begs the question? What is a lego?

threethirtytwo 19 hours ago||
I don't know how someone can look at what you build and conclude LLMs are still google search. It boggles the mind how much hatred people have for AI to the point of self deception. The evidence is placed right in front of you and on your lap with that link and people still deny it.
mattmanser 19 hours ago||
How do you come to that conclusion?

There are absolutely tons of code pens of that style. And jsfiddles, zen gardens, etc.

I think the true mind boggle is you don't seem to realize just how much content the AI conpanies have stolen.

threethirtytwo 19 hours ago||
>I think the true mind boggle is you don't seem to realize just how much content the AI conpanies have stolen.

What makes you think I don't realize it? Looks like your comment was generated by an LLM because that was an hallucination that is Not true at all.

AI companies have stolen a lot of content for training. I AGREE with this. So have you. That content lives rent free in your head as your memory. It's the same concept.

Legally speaking though, AI companies are a bit more in the red because the law, from a practical standpoint, doesn't exactly make illegal anything stored in your brain... but from a technical standpoint information on your brain, a hard drive or a billboard is still information instantiated/copied in the physical world.

The text you write and output is simply a reconfiguration of that information in your head. Look at what you're typing. The English language. It's not copywrited, but every single word your typing was not invented by you, the grammar rules and conventions were ripped off existing standards.

ehnto 5 hours ago||
I think you are pointing out the exact conflation here. The commentor probably didn't steal a bunch of code, because it is possible to reason from first principles and rules and still end up being able to code as a human.

It did not take me reading the entirety of available public code to be kind of okay at programming, I created my way to being kind of okay at programming. I was given some rules and worked with those, I did not mnemonic my way into logic.

None of us scraped and consumed the entire internet, is hopefully pretty obvious, but we still have capabilities in excess of AI.

threethirtytwo 43 minutes ago||
What’s being missed here is how fundamentally alien the starting point is.

A human does not begin at zero. A human is born with an enormous amount of structure already in place: a visual system that segments the world into objects, depth, edges, motion, and continuity; a spatial model that understands inside vs outside, near vs far, occlusion, orientation, and scale; a temporal model that assumes persistence through time; and a causal model that treats actions as producing effects. None of this has to be learned explicitly. A baby does not study geometry to understand space, or logic to understand cause and effect. The brain arrives preloaded.

Before you ever read a line of code, you already understand things like hierarchy, containment, repetition, symmetry, sequence, and goal-directed behavior. You know that objects don’t teleport, that actions cost effort, that symbols can stand in for things, and that rules can be applied consistently. These are not achievements. They are defaults.

An LLM starts with none of this.

It does not know what space is. It has no concept of depth, proximity, orientation, or object permanence. It does not know that a button is “on” a screen, that a window contains elements, or that left and right are meaningful distinctions. It does not know what vision is, what an object is, or that the world even has structure. At initialization, it does not even know that logic exists as a category.

And yet, we can watch it learn these things.

We know LLMs acquire spatial reasoning because they can construct GUIs with consistent layout, reason about coordinate systems, generate diagrams that preserve relative positioning, and describe scenes with correct spatial relationships. We know they acquire a functional notion of vision because they can reason about images they generate, anticipate occlusion, preserve perspective, and align visual elements coherently. None of that was built in. It was inferred.

But that inference did not come from code alone.

Code does not contain space. Code does not contain vision. Code does not contain the statistical regularities of the physical world, human perception, or how people describe what they see. Those live in diagrams, illustrations, UI mockups, photos, captions, instructional text, comics, product screenshots, academic papers, and casual descriptions scattered across the entire internet.

Humans don’t need to learn this because evolution already solved it for us. Our visual cortex is not trained from scratch; it is wired. Our spatial intuitions are not inferred; they are assumed. When we read code, we already understand that indentation implies hierarchy, that nesting implies containment, and that execution flows in time. An LLM has to reverse-engineer all of that.

That is why training on “just code” is insufficient. Code presupposes a world. It presupposes agents, actions, memory, time, structure, and intent. To understand code, a system must already understand the kinds of things code is about. Humans get that for free. LLMs don’t.

So the large, messy, heterogeneous corpus is not indulgence. It is compensation. It is how a system with no sensory grounding, no spatial intuition, and no causal priors reconstructs the scaffolding that humans are born with.

Once that scaffolding exists, the story changes.

Once the priors are in place, learning becomes local and efficient. Inside a small context window, an LLM can learn a new mini-language, adopt a novel set of rules, infer an unfamiliar API, or generalize from a few examples it has never seen before. No retraining. No new data ingestion. The learning happens in context.

This mirrors human learning exactly.

When you learn a new framework or pick up a new problem domain, you do not replay your entire lifetime of exposure. You learn from a short spec, a handful of examples, or a brief conversation. That only works because your priors already exist. The learning is cheap because the structure is already there.

The same is true for LLMs. The massive corpus is not what enables in-context learning; it is what made in-context learning possible in the first place.

The difference, then, is not that humans reason while LLMs copy. The difference is that humans start with a world model already installed, while LLMs have to build one from scratch. When you lack the priors, scale is not cheating. It is the price of entry.

But this is besides the point. We know for a fact that output from humans and LLMs are novel generalizations and not copies of existing data. It's easily proven by asking either a human or an LLM to write a program that doesn't exist in the universe and both the human and the LLM can readily do this. So in the end, both the human and the LLM have copied data in their minds and can generalize new data OFF of that copied data. It's just the LLM has more copied data while the human has less copied data, but both have copied data.

In fact the priors that a human is born with can even be described as copied data but encoded in our genes such that we our born with brains that inherit a learning bias optimized for our given reality.

That is what is missing. You look at speed of learning from training. The apt comparison in this case would be reconstructing a human brain neuron by neuron. If you want to compare how fast a human learns a new programming language with an LLM the correct comparison would thus be to compare with how fast an LLM learns a new programming language AFTER it has been trained and solely within inference in the context window.

In that case, it beats us. Hands down.

subdavis 12 hours ago|||
FYI the cursor animation runs before the font loads if the font isn’t ready yet.
dehugger 19 hours ago||
your github repo was highly entertaining. thanks for make my day a bit brighter:)
Scrapemist 20 hours ago||
Eventually you can show Claude how you solve problems, and explain the thought process behind it. It can apply these learnings but it will encounter new challenges in doing so. It would be nice if Claude could instigate a conversation to go over the issues in depth. Now it wants quick confirmation to plough ahead.
fennecbutt 20 hours ago|
Well I feel like this is because a better system would distill such learning into tokens not associated with a human language and that that could represent logic better than using English etc for it.

I don't have the GPUs or time to experiment though :(

Scrapemist 18 hours ago||
Yes, but I would appreciate it if it uses English to explain its logic to me.
joduplessis 11 hours ago|
Recently I've put Claude/others to use in some agentic workflows with easy menial/repetitive tasks. I just don't understand how people are using these agents in production. The automation is absolutely great, but it requires an insane amount of hand-holding and cleanup.
baq 10 hours ago||
Automate hand holding and cleanup obviously. (Also known as ‘harness’.)
More comments...