Top
Best
New

Posted by vismit2000 17 hours ago

How AI assistance impacts the formation of coding skills(www.anthropic.com)
351 points | 283 commentspage 2
jmatthews 1 hour ago|
I find this so hard to get my head around. I am wildly more prolific with agentic coding. It's at minimum a 10x for the first several iterations and when you get into the heavy detail part I am still the choke point.
jwr 14 hours ago||
The title of this submission is misleading, that's not what they're saying. They said it doesn't show productivity gains for inexperienced developers still gaining knowledge.
visarga 14 hours ago||
The study measures if participants learn the library, but what they should study is if they learn effective coding agent patterns to use the library well. Learning the library is not going to be what we need in the future.

> "We collect self-reported familiarity with AI coding tools, but we do not actually measure differences in prompting techniques."

Many people drive cars without being able to explain how cars work. Or use devices like that. Or interact with people who's thinking they can't explain. Society works like that, it is functional, does not work by full understanding. We need to develop the functional part not the full understanding part. We can write C without knowing the machine code.

You can often recognize a wrong note without being able to play the piece, spot a logical fallacy without being able to construct the valid argument yourself, catch a translation error with much less fluency than producing the translation would require. We need discriminative competence, not generative.

For years I maintained a library for formatting dates and numbers (prices, ints, ids, phones), it was a pile of regex but I maintained hundreds of test cases for each type of parsing. And as new edge cases appeared, I added them to my tests, and iterated to keep the score high. I don't fully understand my own library, it emerged by scar accumulation. I mean, yes I can explain any line, but why these regexes in this order is a data dependent explanation I don't have anymore, all my edits run in loop with tests and my PRs are sent only when the score is good.

Correctness was never grounded in understanding the implementation. Correctness was grounded in the test suite.

discreteevent 13 hours ago|||
> Many people drive cars without being able to explain how cars work.

But the fundamentals all cars behave the same way all the time. Imagine running a courier company where sometimes the vehicles take a random left turn.

> Or interact with people who's thinking they can't explain

Sure but they trust those service providers because they are reliable . And the reason that they are reliable is that the service providers can explain their own thinking to themselves. Otherwise their business would be chaos and nobody would trust them.

How you approached your library was practical given the use case. But can you imagine writing a compiler like this? Or writing an industrial automation system? Not only would it be unreliable but it would be extremely slow. It's much faster to deal with something that has a consistent model that attempts to distill the essence of the problem, rather than patching on hack by hack in response to failed test after failed test.

2sk21 12 hours ago||||
You can, most certainly, drive a car without understanding how it works. A pilot of an aircraft on the other hand needs a fairly detailed understanding of the subsystems in order to effectively fly it.

I think being a programmer is closer to being an aircraft pilot than a car driver.

iammjm 11 hours ago|||
Sure, if you are a pilot then that makes sense. But what if you are a company that uses planes to deliver goods? Like when the focus shifts from the thing itself to its output
northfield27 12 hours ago|||
Agreed
gjadi 14 hours ago|||
Interesting argument.

But isn't the corrections of those errors that are valuable to society and get us a job?

People can tell they found a bug or give a description about what they want from a software, yet it requires skills to fix the bugs and to build software. Though LLMs can speedup the process, expert human judgment is still required.

another-dave 13 hours ago|||
I think there's different levels to look at it.

If you know that you need O(n) "contains" checks and O(1) retrieval for items, for a given order of magnitude, it feels like you've all the pieces of the puzzle needed to make sure you keep the LLM on the straight and narrow, even if you didn't know off the top of your head that you should choose ArrayList.

Or if you know that string manipulation might be memory intensive so you write automated tests around it for your order of magnitude, it probably doesn't really matter if you didn't know to choose StringBuilder.

That feels different to e.g. not knowing the difference between an array list and linked list (or the concept of time/space complexity) in the first place.

gjadi 6 hours ago||
My gut feeling is that, without wrestling with data structures at least once (e.g. during a course), then that knowledge about complexity will be cargo cult.

When it comes to fundamentals, I think it's still worth the investment.

To paraphrase, "months of prompting can save weeks of learning".

visarga 13 hours ago|||
I think the kind of judgement required here is to design ways to test the code without inspecting it manually line by line, that would be walking a motorcycle, and you would be only vibe-testing. That is why we have seen the FastRender browser and JustHTML parser - the testing part was solved upfront, so AI could go nuts implementing.
northfield27 13 hours ago||
I partially agree, but I don’t think “design ways to test the code without inspecting it manually line by line” is a good strategy.

Tests only cover cases you already know to look for. In my experience, many important edge cases are discovered by reading the implementation and noticing hidden assumptions or unintended interactions.

When something goes wrong, understanding why almost always requires looking at the code, and that understanding is what informs better tests.

visarga 13 hours ago||
Another possibility is to implement the same spec twice, and do differential testing, you can catch diverging assumptions and clarify them.
northfield27 12 hours ago||
Isn't that too much work?

Instead, just learning concepts with AI and then using HI (Human Intelligence) & AI to solve the problem at hand—by going through code line by line and writing tests - is a better approach productivity-, correctness-, efficiency-, and skill-wise.

I can only think of LLMs as fast typists with some domain knowledge.

Like typists of government/legal documents who know how to format documents but cannot practice law. Likewise, LLMs are code typists who can write good/decent/bad code but cannot practice software engineering - we need, and will need, a human for that.

concats 14 hours ago|||
I agree. It's very missleading. Here's what the authors actually say:

> AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.

danbruc 14 hours ago||
That itself sounds contradictory to me.

I assistance produces significant productivity gains across professional domains, particularly for novice workers.

We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.

Are the two sentences talking about non-overlapping domains? Is there an important distinction between productivity and efficiency gains? Does one focus on novice users and one on experienced ones? Admittedly did not read the paper yet, might be clearer than the abstract.

mold_aid 10 hours ago|||
Not seeing the contradiction. The two sentences suggest a distinction between novice task completion and supervisory (ie, mastery) work. "The role of workers often shifts from performing the task to supervising the task" is the second sentence in the report.

The research question is: "Although the use of AI tools may improve productivity for these engineers, would they also inhibit skill formation? More specifically, does an AI-assisted task completion workflow prevent engineers from gaining in-depth knowledge about the tools used to complete these tasks?" This hopefully makes the distinction more clear.

So you can say "this product helps novice workers complete tasks more efficiently, regardless of domain" while also saying "unfortunately, they remain stupid." The introductiory lit review/context setting cites prior studies to establish "ok coders complete tasks efficiently with this product." But then they say, "our study finds that they can't answer questions." They have to say "earlier studies find that there were productivity gains" in order to say "do these gains extend to other skills? Maybe not!"

capnrefsmmat 10 hours ago||||
The first sentence is a reference to prior research work that has found those productivity gains, not a summary of the experiment conducted in this paper.
torginus 14 hours ago|||
That doesn't really line up with my experience, I wanted to debug a CMake file recently, having done no such thing before - AI helped me walk through the potential issues, explaining what I got wrong.

I learned a lot more in a short amount of time than I would've stumbling around on my own.

Afaik its been known for a long time that the most effective way of learning a new skill, is to get private tutoring from an expert.

yoz-y 12 hours ago|||
This highly depends on your current skill level and amount of motivation. AI is not a private tutor as AI will not actually verify that you have learned anything, unless you prompt it. Which means that you must not only know what exactly to search for (arguably already an advanced skill in CS) but also know how tutoring works.
torginus 9 hours ago||
My skill level when it comes to CMake is just north of nonexistent, but I was highly motivated as it kinda blocked me in what I actually wanted to do.
hxugufjfjf 13 hours ago|||
Has the claim in your third paragraph been backed by research? Not snark, genuinely curious. I have some anecdotal, personal experience backing it up.
omnicognate 14 hours ago|||
I agree the title should be changed, but as I commented on the dupe of this submission learning is not something that happens as a beginner, student or "junior" programmer and then stops. The job is learning, and after 25 years of doing it I learn more per day than ever.
mold_aid 10 hours ago||
The study doesn't argue that you stopped learning.
omnicognate 7 hours ago||
I didn't say it did. I just pointed out that learning effectively isn't only a concern for "inexperienced developers still gaining knowledge".
emsign 14 hours ago||
> They said it doesn't show productivity gains for inexperienced developers still gaining knowledge.

But that's what "impairs learning" means.

dr_dshiv 15 hours ago||
Go Anthropic for transparency and commitment to science.

Personally, I’ve never been learning software development concepts faster—but that’s because I’ve been offloading actual development to other people for years.

suralind 11 hours ago||
No surprise, really. You can use AI to explore new horizons or propose an initial sketch, but for anything larger than small changes - you must do a rewrite. Not just a review. An actual rewrite. AI can do well adding a function, but you can't vibe code an app and get smarter.

I don't necessarily think that writing more code means you get better coder. I automate nearly all my tests with AI and large chunk of bugfixing as well. I will regularly ask AI to propose an architecture or introduce a new pattern if I don't have a goal in my mind. But in these last 2 examples, I will always redesign the entire approach to be what I consider a better, cleaner interface. I don't recall AI ever getting that right, but must admit I asked AI in the first place cos I didn't know where to start.

If I had to summarize, I would say to let AI implement coding, but not API design/architecture. But at the same time, you can only get good at those by knowing what doesn't work and trying to find a better solution.

teiferer 11 hours ago||
> I automate nearly all my tests with AI

How exactly? Do you tell the agent "please write a test for this" or do you also feed it some form of spec to describe what the tested thing is expected to do? And do these tests ever fail?

Asking because the first option essentially just sets the bugs in stone.

Wouldn't it make sense to do it the other way around? You write the test, let the AI generate the code? The test essentially represents the spec and if the AI produces sth which passes all your tests but is still not what you want, then you have a test hole.

suralind 11 hours ago||
I'm not saying my approach is correct, keep that in mind.

I care more about the code than the tests. Tests are verification of my work. And yes, there is a risk of AI "navigating around" bugs, but I found that a lot of the time AI will actually spot a bug and suggest a fix. I also review each line to look for improvements.

Edit: to answer your question, I will typically ask it to test a specific test case or few test cases. Very rarely will I ask it to "add tests everywhere". Yes, these tests frequently fail and the agent will fix on 2nd+ iteration after it runs the tests.

One more thing to add is that a lot of the time agent will add a "dummy" test. I don't really accept those for coverage's sake.

teiferer 10 hours ago||
Thanks for your responses!

A follow-up:

> I care more about the code than the tests.

Why is that? Your (product) code has tests. Your test (code) doesn't. So I often find that I need to pay at least as much attention to my tests to ensure quality.

suralind 9 hours ago||
I think you are correct in your assessment. Both are important. If you're gonna have garbage code tests, you're gonna have garbage quality.

I find tests easier to write. Your function(s) may be hundred lines long, but the test is usually setup, run, assert.

I don't have much experience beyond writing unit/integration tests, but individual test cases seem to be simpler than the code they test (linear, no branches).

james_marks 11 hours ago|||
This is why the quality of my code has improved since using AI.

I can iterate on entire approaches in the same amount of time it would have taken to explore a single concept before.

But AI is an amplifier of human intent- I want a code base that’s maintainable, scalable, etc., and that’s a different than YOLO vibe coding. Vibe engineering, maybe.

acedTrex 7 hours ago||
My core uses are 100% racing the model in yolo mode to find a bug. I win most of the time but occasionally it surprises me.

Then also switching arch approaches quickly when i find some code strategies that are not correctly ergonomic. Splitting of behaviors and other refactors are much lower cost now.

mickeyp 11 hours ago||
> No surprise, really. You can use AI to explore new horizons or propose an initial sketch, but for anything larger than small changes - you must do a rewrite. Not just a review. An actual rewrite. AI can do well adding a function, but you can't vibe code an app and get smarter.

Sometimes I wonder if people who make statements like this have ever actually casually browsed Twitter or reddit or even attempted a "large" application themselves with SOTA models.

JustSkyfall 11 hours ago|||
You can definitely vibecode an app, but that doesn't mean that you can necessarily "get smarter"!

An example: I vibecoded myself a Toggl Track clone yesterday - it works amazingly but if I had to rewrite e.g. the PDF generation code by myself I wouldn't have a clue!

suralind 10 hours ago||
That's what I meant, it's either, or. Vibe coding definitely has a place for simple utilities or "in-house" tools that solve one problem. You can't vide code and learn (if you do, then it's not vibe coding as I define it).
suralind 10 hours ago|||
Did I say that you can't vibe code an app? I browse reddit and have seen the same apps as you did, I also vibe code myself every now and then and know what happens when you let it loose.
northfield27 14 hours ago||
Edit: Changed title

Previous title: "Anthropic: AI Coding shows no productivity gains; impairs skill development"

The previous title oversimplified the claim to "all" developers. I found the previous title meaningful while submitting this post because most of the false AI claims of "software engineer is finished" has mostly affected junior `inexperienced` engineers. But I think `junior inexperienced` was implicit which many people didn't pick.

The paper makes a more nuanced claim that AI Coding speeds up work for inexperienced developers, leading to some productivity gains at the cost of actual skill development.

simonw 14 hours ago||
Key snippet from the abstract:

> Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.

The library in question was Python trio and the model they used was GPT-4o.

Wojtkie 4 hours ago||
This is interesting. I started teaching myself Polars and used Claude to help me muscle through some documentation in order to meet deadlines on a project.

I found that Claude wasn't too great at first at it and returned a lot of hallucinated methods or methods that existed in Pandas but not Polars. I chalk this up to context blurring and that there's probably a lot less Polars code in the training corpus.

I found it most useful for quickly pointing me to the right documentation, where I'd learn the right implementation and then use it. It was terrible for the code, but helpful as a glorified doc search.

Kiboneu 5 hours ago||
When coding agents are unavailable I just continue to code myself or focus on architecture specification / feature descriptions. This really helps me retain my skills, though there is some "skew" (I'm not sure how to describe it, it's a feeling). Making instructions to LLMs to me is pretty similar to doing the basic software architecture and specification work that a lot of people tend to skip (now, there's not choice and it's directly useful). When you skip specification for a sufficiently complex project, you likely introduce footguns along the way that slows down development significantly. So what would one expect when they run a bunch of agents based on a single sentence prompt?!

Like the architecture work and making good quality specs, working on code has a guiding effect on the coding agents. So in a way, it also benefits to clarify items that may be more ambiguous in the spec. If I write some of the code myself, it will make fewer assumptions about my intent when it touches it (especially when I didn't specify them in the architecture or if they are difficult to articulate in natural language).

In small iterations, the agent checks back for each task. Because I spend a lot of time on architecture, I already have a model in my mind of how small code snippets and feature will connect.

Maybe my comfort with reviewing AI code comes form spending a large chunk of my life reverse engineering human code, to understand it to the extent that complex bugs and vulnerabilities emerge. I've spent a lot of time with different styles of code writing from awful to "this programmer must have a permanent line to god to do this so elegantly". The models is train on that, so I have a little cluster of neurons in my head that's shaped closely enough to follow the model's shape.

lelanthran 10 hours ago||
I must say I am quite impressed that Anthropic published this, given that they found that:

1. AI help produced a solution only 2m faster, and

2. AI help reduced retention of skill by 17%

siliconc0w 3 hours ago|
It's pretty insidious to think that these AI labs want you become so dependent on them so that once the VC-gravy-train stops they can hike the token price 10x and you'll still pay because you have no other choice.

(thankfully market dynamics and OSS alternatives will probably stop this but it's not a guarantee, you need like at least six viable firms before you usually see competitive behavior)

Zababa 3 hours ago|
>It's pretty insidious to think that these AI labs want you become so dependent on them so that once the VC-gravy-train stops they can hike the token price 10x and you'll still pay because you have no other choice.

I don't think that's true? From what I understand most labs are making money from subscription users (maybe not if you include training costs, but still, they're not selling at a loss).

>(thankfully market dynamics and OSS alternatives will probably stop this but it's not a guarantee, you need like at least six viable firms before you usually see competitive behavior)

OpenAI is very aggressive with the volume of usage you can get from Codex, Google/DeepMind with Gemini. Anthropic reduced the token price with the latest Opus release (4.5).

More comments...