Posted by cainxinth 9/3/2025
These days, I'm fairly senior and don't touch code much anymore but I find it really really instructive to get my hands dirty and struggle through new code and ideas. I think the "just tweak the prompts bro" people are missing out on learning.
For now the difference between these two populations is not that pronounced yet but give it a couple of years.
An abstraction is a deterministic, pure function, than when given A always returns B. This allows the consumer to rely on the abstraction. This reliance frees up the consumer from having to implement the A->B, thus allowing it to move up the ladder.
LLMs, by their very nature are probabilistic. Probabilistic is NOT deterministic. Which means the consumer is never really sure if given A the returned value is B. Which means the consumer now has to check if the returned value is actually B, and depending on how complex A->B transformation is, the checking function is equivalent in complexity as implementing the said abstraction in the first place.
We can use different words if you like (and I'm not convinced that delegation isn't colloquially a form of abstraction) but you can't control the world by controlling the categories.
The most harmful myth in all of education is the idea that you need to master some basic building blocks in order to move on to a higher level. That really is just a noticeable exception. At best you can claim that it's difficult for other people to realize that your new way solves the problem, or that people should really learn X because it's generally useful.
I don't see the need for this kind of compulsory education, and it's doing much more harm than good. Bodybuilding doesn't even appear as a codified sport until well after the industrial revolution, it's not until we are free of sustenance labor that human intelligence will peak. Who would be happy with a crummy essay if humans could learn telekinesis?
> Who would be happy with a crummy essay if humans could learn telekinesis?
I'm glad that's not the professional consensus on education, at least for now. And "telekinesis," really?
AI can do better organization than you, it's only inertia and legalities that prevent it from happening. See, without good education, you aren't even able to find a place for yourself.
> The most harmful myth in all of education is the idea that you need to master some basic building blocks in order to move on to a higher level.
That "myth" is supported by abundant empirical evidence, people have tried education without it and it didn't work. My lying eyes kind of confirm it too, I had one hell of time trying to use LLM without getting dumber... it comes so natural to them, skipping steps is seductive but blinding.
> I don't see the need for this kind of compulsory education, and it's doing much more harm than good.
Again, long standing empirical evidence tells as the opposite. I support optional education but we can't even have a double blind study for it - I'm pretty sure those who don't go to school would be home-schooled, too few are dumb enough to let their uneducated children chose their manner and level of education.
customers dont care about the syntactic sugar/advanced reflection in the codebase of the product that theyre buying. if the end product of the delegator and the expert is the same, employers will go with the faster one every time.
Frankly, I'm sure there will be much more studies in this direction. Now this is a university, an independent organization. But, given the amount of money involved, some of future studies will come from the camp vitally interested in people believing that by outsourcing their work to coding agents they are becoming smarter instead of losing achieved skills. Looking forward to reading the first of these.
> Students who repeatedly relied on ChatGPT showed weakened neural connectivity, impaired memory recall, and diminished sense of ownership over their own writing
So we're going to have more bosses, perhaps not in title, who think they're becoming more knowledgeable about a broad range of topics, but are actually in cognitive decline and out of touch with reality on the ground. Great.
Do you therefore argue programming languages aren't abstractions?
The problem with this analogy is obvious when you imagine an assembler generating machine code that doesn't work half of the time and a human trying to correct that.
Yes, and no. They’re abstractions in the sense of hiding the implementation details of the underlying assembly. Similarly, assembly hides the implementation details of the cpu, memory, and other hw components.
However, except with programming languages you don’t need to know the details of the underlying layers except for very rare cases. The abstraction that programming languages provide is simple, deterministic, and well documented. So, in 99.999% of cases, you can reason based on the guarantees of the language, regardless of how those guarantees are provided. With LLMs, the relation between input and output is much more loose. The output is non-deterministic, and tiny changes to the input can create enormous changes in the output seemingly without reason. It’s much shakier ground to build on.
The behaviour of the = operator in Python is certainly deterministic and well-documented, but depending on context it can result in either a copy (2x memory consumption) or a pointer (+64bit memory consumption). Values that were previously pointers can also suddenly become copies following later permutation. Do you think this through every time you use =? The consequences of this can be significant (e.g. operating on a large file in memory); I have seen SWEs make errors in FastAPI multipart upload pipelines that have increased memory consumption by 2x, 3x, in this manner.
Meanwhile I can ask an LLM to generate me Rust code, and it is clearly obvious what impact the generated code has on memory consumption. If it is a reassignment (b = a) it will be a move, and future attempts to access the value of a would refuse to compile and be highlighted immediately in an IDE linter. If the LLM does b = &a, it is clearly borrowing, which has the size of a pointer (+64bits). If the LLM did b = a.clone(), I would clearly be able to see that we are duplicating this data structure in memory (2x consumption).
The LLM code certainly is non-deterministic; it will be different depending on the questions I asked (unlike a compiler). However, in this particular example, the chosen output format/language (Rust) directly exposes me to the underlying behaviour in a way that is both lower-level than Python (what I might choose to write quick code myself) yet also much, much more interpretable as a human than, say, a binary that GCC produces. I think this has significant value.
I guess that could be a problematic behavior if you want reproducibility ala (relatively) reproducible abstraction like compilers. With LLMs, there are too many uncontrollable variables to precisely reproduce a result from the same input.
However, the specific discussion here is about delegating the work of writing to an LLM, vs abstracting the work of writing via deterministic systems like libraries, frameworks, modules, etc. It is specifically not about abstracting the work of compiling, constructing, or smelting.
They are probabilistic. Running them on even different hardware yields different results. And the deltas compound the longer your context and the more tokens you're using (like when writing code).
But more importantly, always selecting the most likely token traps the LLM in loops, reduces overall quality, and is infeasible at scale.
There are reasons that literally no LLM that you use runs deterministically.
Only when you turn the temperature up they become probabilistic for a given input in that case. If you take shortcuts in implementing the inference, then sure, rounding errors may accumulate and prevent that, but that is not an issue with the models but with your choice of how to implement the inference.
To address your specific point in the same way: When we're talking about programmers using abstractions, we're usually not talking about the programming language their using, we're talking about the UI framework, networking libraries, etc... they're using. Those are the APIs their calling with their code, and those are all abstractions that are all implemented at (roughly) the same level of abstraction as the programmer's day-to-day work. I'd expect a programmer to be able to re-implement those if necessary.
Managers tend to hire sub managers to manage their people. You can see this with LLM as well, people see "Oh this prompting is a lot of work, lets make the LLM prompt the LLM".
I guess I'm not 100% sure I agree with my original point though, should a programmer working on JavaScript for a website's frontend be able to implement a browser engine. Probably not, but the original point I was trying to make is I would expect a programmer working on a browser engine to be able to re-implement any abstractions that they're using in their day-to-day work if necessary.
Partially because of all else fails, you'll need to step in and do the thing. Partially because if you can't do it, you can't evaluate whether it's being done properly.
That's not to say you need to be _as good_ at the task as the delegee, but you need to be competent.
For example, this HBR article [1]. Pervasive in all advice about delegation is the assumption that you can do the task being delegated, but that you shouldn't.
> Just that it's not an expectation, e.g., you don't expect a CEO to be able to do the CTO's job.
I think the CEO role is actually the outlier here.
I can only speak to engineering, but my understanding has always been that VPs need to be able to manage individual teams, and engineering managers need to be somewhat competent if there's some dev work that needs to be done.
This only happens as necessary, and it obviously should be rare. But you get in trouble real quickly if you try to delegate things you cannot accomplish yourself.
There is another form of delegation where the work needed to be done is imposed onto another, in order to exploit and extract value. We are trying to do this with LLMs now, but we also did this during the Industrial Revolution, and before that, humanity enslaved each other to get the labor to extract value out of the land. This value extraction leads to degeneration, something that happens when living systems dies.
While the Industrial Revolution afforded humanity a middle-class, and appeared to distribute the wealth that came about — resulting in better standards of living — it came along with numerous ills that as a society, we still have not really figured out.
I think that, collectively, we figure that the LLMs can do the things no one wants to do, and so _everyone_ can enjoy a better standard of living. I think doing it this way, though, leads to a life without purpose or meaning. I am not at all convinced that LLMs are going to give us back that time … not unless we figure out how to develop AIs that help grow humans instead of replacing them.
The following article is an example of what I mean by designing an AI that helps develop people instead of replacing them: https://hazelweakly.me/blog/stop-building-ai-tools-backwards...
I don't think you can get the kinds of robots they want without also inventing the artificial equivalent of soul. So their whole moral sidestep to reimplement slavery won't even work. Enslaving sapient beings is evil whether they are made of meat or metal.
All of which is beside the point, because soon-ish LLMs are going to develop their own equivalents of experimentation, formalisation of knowledge, and collective memory, and then solutions will become standardised and replicable - likely with a paradoxical combination of a huge loss of complexity and solution spaces that are humanly incomprehensible.
The arguments here are like watching carpenters arguing that a steam engine can't possibly build a table as well as they can.
Which - is you know - true. But that wasn't how industrialisation worked out.
Colleagues are the same thing. You may abstract business domains and say that something is the job of your colleague, but sometimes that abstraction breaks.
Still good enough to draw boxes and arrows around.
So are humans and yet people pay other people to write code for them.
*towards bad engineering, unless*
That must be why we talk about leaky abstractions so much.
They're neither pure functions, nor are they always deterministic. We as a profession have been spoilt by mostly deterministic code (and even then, we had a chunk of probabilistic algorithms, depending on where you worked).
Heck, I've worked with compilers that used simulated annealing for optimization, 2 decades ago.
Yes, it's a sea change for CRUD/SaaS land. But there are plenty of folks outside of that who actually took the "engineering" part of software engineering seriously, and understand just fine how to deal with probabilistic processes and risk management.
I believe that if you can tweak the temperature input (OpenAI recently turned it off in their API, I noticed), an input of 0 should hypothetically result in the same output, given the same input.
This couldn't be any more wrong. LLMs are 100% deterministic. You just don't observe that feature because you're renting it from some cloud service. Run it on your own hardware with a consistent seed, and it will return the same answer to the same prompt every time.
LLMs, as used in practice in 99.9% of cases, are probabilistic.
Although I'm on the side of getting my hands dirty, I'm not sure if the difference is that different. A modern compiler embeds a considerable degree of probabilistic behaviour.
Can you give some examples?
The LLM expands the text of your design into a full application.
The commenter you’re responding to is clear that they are checking the outputs.
So are compilers, but people still successfully use them. Compilers and LLMs can both be made deterministic but for performance reasons it's convenient to give up that guarantee.
That is just not correct. There is no rule that says an abstraction is strictly functional or deterministic.
In fact, the original abstraction was likely language, which is clearly neither.
The cleanest and easiest abstractions to deal with have those properties, but they are not required.
1. Language is an abstraction and it's not deterministic (it's really lossy)
2. LLMs behave differently than the abstractions involved in building software, where normally if you gave the same input, you'd expect the same output.
Sorry for being pedantic, I was just curious what you mean at all. Language as abstraction of thought implies that thought is always somehow more "general" than language, right? But if that was the case, how could I read a novel that brings me to tears? Is not my thought in this case more the "lossy abstraction" of the language than the other way around?
Or, what is the abstraction of the "STOP" on the stop sign at the intersection?
and I've never looked at the machine code produced by an assembler (other than when I wrote my own as a toy project)
is the same true of LLM usage? absolutely not
and it never will be, because it's not an abstraction
It is not yet good enough or there is not yet sufficient trust. Also there are still resources allocated to checking the code.
I saw a post yesterday showing Brave browser's new tab using 70mb of RAM in the background. I'm very sure there's code there that can be optimized, but who gives a shit. It's splitting hairs and our computers are powerful enough now that it doesn't matter.
Immateriality has abstracted that particular few line codes away.
I do. This sort of attitude is how we have machines more powerful than ever yet everything still seems to run like shit.
Were we advised to check compiler output every single time "in the early days"?
No, that's not the difference.
A compiler from whatever high/low level language is expected to translate a formal specification of an algorithm faithfully. If it fails to do so, the compiler is buggy, period.
A LLM is expected to understand fuzzy language and spit out something that makes sense.
It's a fundamentally different task, and I trust a human more with this. Certainly, humans are judged by their capability to do this, apply common sense, ask for necessary clarification, also question what they're being asked to do.
I understand the world is about compromises, but all the gains of essentially every computer program ever could be summed up by accumulation of small optimizations. Likewise, the accumulation of small wastes kills legacy projects more than anything else.
Flagging something as potentially problematic is useful but without additional information related to the tradeoffs being made this may be an optimized way to do whatever Brave is doing which requires the 70MB of RAM. Perhaps the non-optimal way it was previously doing it required 250MB of RAM and this is a significant improvement.
Supply and demand will decide what compromise is acceptable and what that compromise looks like.
I have been hearing (reading?) this for a solid two years now, and LLMs were not invented two years ago: they are ostensibly the same tech as they were back in 2017, with larger training pools and some optimizations along the way. How many more hundreds of billions of dollars is reasonable to throw at a technology that has never once exceeded the lofty heights of "fine"?
At this point this genuinely feels like silicon valley's fever dream. Just lighting dumptrucks full of money on fire in the hope that it does something better than it did the previous like 7 or 8 times you did it.
And normally I wouldn't give a shit, money is made up and even then it ain't MY money, burn it on whatever you want. But we're also offsetting any gains towards green energy standing up these stupid datacenters everywhere to power this shit, not to mention the water requirements.
It was basically a novelty before. "Wow, AI can sort of write code!"
Now I find it very capable.
I suspect there's a lot more use out there generating money than you realize, there's no moat in using it, so I'm pretty sure it's kept on the downlow for fear of competitors catching up (which is quick and cheap to do).
How far can one extrapolate? I defer to the experts actually making these things and to those putting money on the line.
The "early stages" argument means "not fit for production purposes" in any other case. It should also mean the same here. It's early stages because the product isn't finished (and can't be, at least with current knowledge)
It works, we are waiting for the infrastructure to support it to be put in place.
These are secondary concerns. We're past if it's useful or not.
Just because you end up looking at what the prompt looks like “under the hood” in whichever language it produced the output, doesn’t mean every user does.
Similar as with assembly, you might have not taken a look at it, but there are people that do and could argue the same thing as you.
The lines will be very blurry in the near future.
Personally, I think if your farts are an abstraction that you can derive useful meaning from the mapping, who are we to tell you no?
(Also: bizarre examples = informative edge cases. Sometimes.)
> Similar as with assembly, you might have not taken a look at it, but there are people that do and could argue the same thing as you.
... No. The assembler is deterministic. Barring bugs, you can basically trust that it does exactly what it was told to. You absolutely cannot say the same of our beloved robot overlords.
If you do make your specs precise enough, such that 2 different dev shops will produce functionally equivalent software, your specs are equivalent to code.
The value of this is that FOR FREE you can get comprehensive test defintions (unit+e2e), kube/terraform infra setup, documentation stubs, openai specs, etc. It's seriously magical.
Keeping in mind that I have seen hundreds to thousands of production errors in applications with very high coverage test suites?
How many production errors would you expect to see over 5 years of LLM deployments.
``` Circle() .fill(Color.red) .overlay( Circle().stroke(Color.white, lineWidth: 4) ).frame(width: 100, height: 100) ```
Is the mapping 1:1 and completely lossless? Of course not, I'd say the former is most definitely a sort of abstraction of the latter, and one would be being disingenuous to pretend it's not.
The only thing I’m certain of is that you’re highly overconfident.
I’m sure plenty of assembly gurus said the same of the first compilers.
> because it's not an abstraction
This just seems like a category error. A human is not an abstraction, yet they write code and produce value.
An IDE is a tool not an abstraction, yet they make humans more productive.
When I talk about moving up the levels of abstraction I mean: taking on more abstract/less-concrete tasks.
Instead of “please wire up login for our new prototype” it might be “please make the prototype fully production-ready, figure out what is needed” or even “please ship a new product to meet customer X’s need”.
The customer would just ask the AI directly to meet their needs. They wouldn’t purchase the product from you.
and to be able to do this efficiently or even "correctly", you'd need to have had mountains of experience evaluating an implementation, and be able to imagine the consequences of that implementation against the desired outcome.
Doing this requires experience that would get eroded by the use of an LLM. It's very similar to higher level maths (stuff like calculus) being much more difficult if you had poor arithmetic/algebra skills.
Yes. If you stop doing something, you get worse at it. There is literally no exception to this that I'm aware of. In the future where everyone is dependent on ever larger amounts of code, the possibility that nobody will be equipped to write/debug that code should scare you.
The superpower you speak of is to become a product manager, and lose out on the fun of problem solving. If that's the future of tech, I want nothing to do with it.
You could also tweak it by going like "Lead me to the US" -> "Lead me to the state of New York" -> "Lead me to New York City" -> "Lead me to Manhattan" -> "Lead me to the museum of new arts" and it would give you 86% accurate directions, would you still need to be able to navigate?
How about when you go over roads that are very frequently used you push to 92% accuracy, would you still need to be able to navigate?
Yes of course because in 1/10 trips you'd get fucking lost.
My point is: unless you get to that 99% mark, you still need the underlying skill and the abstraction is only a helper and always has to be checked by someone who has that underlying skill.
I don't see LLMs as that 99% solution in the next years to come.
[1]: https://arxiv.org/abs/2401.11817
[2]: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
[3]: https://publichealthpolicyjournal.com/mit-study-finds-artifi...
We're not because you have to still check every outputted code. You didn't have to check every compilation step of a compiler. It was testable actual code, not non-deterministic output from English language input
The number of users actually checking the output of a compiler is nonexistent. You just trust it.
LLMs are moving that direction, whether we like it or not
Quite a few who work on low level systems do this. I have done this a few times to debug build issues: this one time a single file suddenly made compile times go up by orders of magnitude, the compiler inlined a big sort procedure in an unrolled loop, so it added the sorting code hundreds of times over in a single function and created a gigantic binary that took ages to compile since it tried to optimize that giant function.
That is slow both in runtime and compile time, so I added a tag to not inline the sort there, and all the issues disappeared. The sort didn't have a tag to inline it, so the compiler just made an error here, it shouldn't have inlined such a large function in an unrolled loop.
The Chinese models are getting hyper efficient and really good at agentic tasks. They're going to overtake Claude as the agentic workhorses soon for sure, Anthropic is slow rolling their research and the Chinese labs are smoking. Speed/agentic ability don't show big headlines, but they really matter.
GPT5 might not impress you with its responses to pedestrian prompts, but it is a science/algorithm beast. I understand what Sam Altman was saying about how unnerving its responses can be, it can synthesize advanced experiments and pull in research from diverse areas to improve algorithms/optimize in a way that's far beyond the other LLMs. It's like having a myopic autistic savant postdoc to help me design experiments, I have to keep it on target/focused but the depth of its suggestions are pretty jaw dropping.
To me, that's what makes it an abstraction layer, rather than just a servant or an employee. You have to break your entire architecture into units small enough that you know you can coax the machine to output good code for. The AI can't be trusted as far as you can throw it, but the distance from you to how far you can throw is the abstraction layer.
An employee you can just tell to make it work, they'll kill themselves trying to do it, or be replaced if they don't; eventually something will work, and you'll take all the credit for it. AI is not experimenting, learning and growing, it stays stupid. The longer it thinks, the wronger it thinks. You deserve the credit (and the ridicule) for everything it does that you put your name on.
-----
edit: and this thread seems to think that you don't have to check what your high level abstraction is doing. That's probably why most programs run like crap. You can't expect something you do in e.g. python to do the most algorithmically sensible thing, even if you wrote the algorithm just like the textbook said. It may make weird choices (maybe optimal for the general case, but horrifically bad for yours) that mean that it's not really running your cute algorithm at all, or maybe your cute algorithm is being starved by another thread that you have no idea why it would be dependent on. It may have made correct choices when you started writing, then decided to make wrong choices after a minor patch version change.
To pretend perfection is a necessary condition for abstraction is not even somebody would say directly. Never. All we talk about is leaky abstractions.
Remember when GTA loading times, which (a counterfactual because we'll never know) probably decimated sales, playtime, and at least the marketing of the game, turned out to be because they were scanning some large, unnecessary json array (iirc) hundreds of times a second? That's probably a billion dollar mistake. Just because some function that was being blindly called was not ever reexamined, and because nobody profiled properly (i.e. checked the output.)
Got any studies about reasoning decline from using compilers to go with your claim?
LLMs make up whatever they feel like and are pretty bad at architecture as well.
http://employees.oneonta.edu/blechmjb/JBpages/m360/Professio...
"Somewhere there must be men and women with capacity for original thought."
He wrote that in 1957. 1957!
However, since I brought up calculators, I'd like to pre-emphasize something: They aren't analogous to today's LLMs. Most people don't offload their "what and why" executive decision-making to a calculator, calculators are orders of magnitude more trustworthy, and they don't emit plausible lies to cover their errors... Though that last does sound like another short-story premise.
Same way a phone in your pocket gives you the world's compiled information available in a moment. But that's generally led to loneliness, isolation, social upheaval, polarization, and huge spread of wrong information.
If you can handle the negatives is a big if. Even the smartest of our professional class are addicted to doomscrolling these days. You think they will get the positives of AI use only and avoid the negatives?
I’ve read plenty of books (thanks, Dickens) where I looked at every word on every page but can recall very little of what they meant. You can look at the results from an llm and say “huh cool, I know that now) and do nothing to assimilate that knowledge, or you can think deeply about it and try to fit it in with everything else you know about the subject. The advantage here is that you can ask follow-up questions if something doesn’t click.
We have the idea of 'tutorial hell' for programming (particularly gamedev), where people go through the motions of learning without actually progressing.
Until you go apply the skills and check, it's hard to evaluate the effectiveness of a learning method.
This reminds me of back 11,500 years ago, when people used to worship the sharper or bigger pieces of obsidian. They felt the biggest piece would win them the biggest hunt.
They forgot that the size of the tool mattered less than mastery of the hunt. Why the best hunter could take down a moving mattress with just the right words, string and a cliff.
I remember it took me like 4 nights of standing to make Isometric projections of a landing gear strut. I wondered if pursuing an Engineering degree was even worth it. Some of my classmates did quit, as years went by.
These days they just let you use CAD software to make things work, and based on what I hear kids just Copy paste files and are done with the assignments.
I mean we all have these Kids these days talk, but somethings do matter. Making all these tasks easy has allowed lots of people who would have other wise failed in the previous generations pass.
There is now an unemployment and low pay crisis all over India due to so many Engineers passing. Sometimes when I hear the newer generations complain about how hard it is to buy a home, or get a good job. Im inclined to think, perhaps hard things should have been kept hard for a reason.
The issue is that its a homework exercise. It's goal is to help you practice thinking about the problem. The Indian system is clear proof that passing an exam is easier, than actually mastering the subject being tested.
However, this is not the cause of the jobs crisis. That is simply because there are not enough jobs which can provide income and social mobility. That is why we needed growth.
Some of those ladders have been removed because automation has removed low-skill labor roles. Now we are going to remove entry level roles.
To put it in a crude manner - humanity's "job" today, seems to be "growing" a generation of humans over a 20 year time span, to prepare them for the world that we face.
This means building systems that deliver timely nutrition, education, stimulation, healthcare, and support.
We could do this better.
However, one place where LLMs have proved to be incredibly helpful is with build tools, dependency hell, etc. Lately I've been trying to update a decade-old Node/Electron project to modern packages and best-practices, and holy hell I simply was not making any meaningful progress until I turned to Claude.
The JS world simply moves too fast (especially back when this project was written) to make updating a project like that even remotely possible in any reasonable amount of time. I was tearing my hair out for days, but yesterday I was finally able to achieve what I was wanting in a few hours with Claude. I still had to work slowly and methodically, and Claude made more than a few stupid errors along the way, but dealing with the delicate version balancing that coincided with API and import/export style changes, all the changes in the bundler world, etc simply could not have been done without it. It's the first time that I was 100% glad I relied on an LLM and felt that it was precisely the right tool for the job at hand.
Remember we aren’t all above average. You shouldn’t worry. Now that we have widespread literacy, nobody needs to and few even could recite Norse Sagas or the Illiad from memory. Basically nobody has useful skills for nomadic survival.
We’re about to move on to more interesting problems, and our collective abilities and motivation will still be stratified as it always has been and must be.
-Your Friend Mel (probably)
Who is "we"? There are more people out there in the world doing hard physical labor, or data entry, than there are software engineers.
Also I even though I have Copilot extension in VSCode I rarely use it… because I find it interrupts my flow with constant useless or incorrect or unwanted suggestions. Instead, when I want AI help, I type out my request by hand into a Gemini gem which contains a prompt describing my preferred coding style - but even with extra guidance as to how I want it to write code, I still often don’t like what it does and end up rewriting it
> For now the difference between these two populations is not that pronounced yet but give it a couple of years.
There are lots and lots of programmers and other IT people who make a living that I wouldn't say fall into your first bucket.
Its like having a your own personal on tap tutor (for free in most cases!).
Well, not so slowly it seems.
What I'm seeing is most of this group never really had the capability in the first place. These are the formerly unproductive slackers who now churn out GenAI slop with their name on it at an alarming rate.
> (1) people who are able to understand the concepts deeply, build a mental model of it and implement them in code at any level, and (2) people who outsource it to a machine and slowly, slowly loose that capability.
...is it really only going to be these two? No middle ground, gradient, or possibly even a trichotomous variation of your split? > loose that capability
You mean "lose". ;)If you stop thinking, then of course you will learn less.
If instead you think about the next level of abstraction up, then perhaps the details don’t always matter.
The whole problem with college is that there is no “next level up”, it’s a hand-curated sequence of ideas that have been demonstrated to induce some knowledge transfer. It’s not the same as starting a company and trying to build something, where freeing up your time will let you tackle bigger problems.
And of course this might not work for all PhDs; maybe learning the details is what matters in some fields - though with how specialized we’ve become, I could easily see this being a net win.
One of the other replies alludes to it, but I want to say it explicitly:
The key difference is that you can generally drill down to assembly, there is infinitely precise control to be had.
It'd be a giant pain in the ass, and not particularly fast, but if you want to invoke some assembly code in your Java, you can just do that. You want to see the JIT compiler's assembly? You can just do that. JIT Compiler acting up? Disable it entirely if you wish for more predictable & understandable execution of the code.
And while people used to higher level languages don't know the finer details of assembly or even C's memory management, they can incrementally learn. Assembly programming is hard, but it is still programming and the foundations you learn from other programming do help you there.
Yet AI is corrosive to those foundations.
It's way easier to drill down in this way than the bytecode/assembly vs. high-level language divide.
You can. You can also read the code a compiler produces perfectly well. In fact https://godbolt.org/ is a web site dedicated to programmers do just that. But ... how many programmers do you know who look at the assembler their compiler produces? In fact how many programmers do you know who understand the assembler?
Now lets extrapolate a bit. I've seen people say they've vibe coded a some program, yet they can't program. Did they read the code the LLM produced? Of course not. Did it matter? Apparently not for the program they produced.
Does the fact that they can vide program but not read code alter the types of programs they can produce? Of course it does. There limited to the sort of programs an LLM has seen before. Does that matter? Possibly not if the only programs they write are minor variations of what has been posted onto the internet already.
Now take two people, one who can only vide code, and another who knows how to program and understands computers at a very deep level. Ask yourself, who is going to be paid more? Is it the one who can only write programs that have been seen many times before by an LLM, or is it the one who can produce something truly new and novel?
A big problem with the "Just read the code" approach is that reading the code at the level deep enough to truly understand it is at minimum equally time-consuming than writing the code in the first place. (And in practice tends to be significantly worse) Anyone who claims they're reading the LLM's code output properly is on some level lying to them.
Human brains are simply bad at consistently monitoring output like that, especially if the output is consistently "good", especially especially when the errors appear to be "good" output on the surface level. This is universal across all fields and tools.
Some prompts / AI agents will write all the validations and security concerns when prompted to write an API endpoint (or whatever). Others may not, because you didn't specify it.
But if someone who doesn't actually know about security just trusts that the AI will just do it for you - like how a developer using framework might - you'll run into issues fast.
All previous programming abstractions kept correctness, a python program produce no less reliable results than a C program running the same algorithm, it just took more time.
LLM doesn't keep correctness, I can write a correct prompt and get incorrect results. Then you are no longer programming, you are a manager over a senior programmer suffering from extreme dementia so they forget what they were doing a few minutes ago and you try to convince him to write what you want before he forgets about that as well and restart the argument.
That's not strictly speaking true, since most (all?) high level languages have undefined behaviors, and their behavior varies between compilers/architectures in unexpected ways. We did lose a level of fidelity. It's still smaller than the loss of fidelity from LLMs but it is there.
Also, it seems like there's little chance for knowledge transfer. If I work with dictionaries in python all the timrle, eventually I'm better prepared to go under the hood and understand their implementation. If I'm prompting a LLM, what's the bridge from prompt engineering to software engineering? Not such direct connection, surely!
It's a pedantic reply to a pedantic point :)
> If I'm prompting a LLM, what's the bridge from prompt engineering to software engineering?
A sibling also made this point, but I don't follow. You can still read the code.
If you don't know the syntax, you can ask the LLM to explain it to you. LLMs are great for knowledge transfer, if you're actually trying to learn something - and they are strongest in domains where you have an oracle to test your understanding, like code.
"Correctness" must always be considered with respect to something else. If we take e.g. the C specification, then yes, there are plenty of compilers that are in almost all ways people will encounter correct according to that spec, UB and all. Yes, there are bugs but they are bugs and they can be fixed. The LLVM project has a very neat tool called Alive2 [1] that can verify optimization passes for correctness.
I think there's a very big gap between the kind of reliability we can expect from a deterministic, verified compiler and the approximating behavior of a probabilistic LLM.
You run into Python/Javascript/etc programers who have no concept of what operations might execute quickly or slowly. There isn't a mental model of what the interpreter is doing.
We're often insulated from the problem because the older generation often used fairly low level languages on very limited computers, and remember lessons from that era. That's not true of younger developers.
Having curiosity to examine the platform that your software is running on and taking a look into what the compilers generate is a skill worth having. Even if you never write raw assembly yourself, being able to see what the compiler generated and how data is laid out does matter. This then helps you make better decisions about what patterns of code to use in your higher level language.
I love learning by reading, to the point that I’ll read the available documentation for something before I decide to use it. This consumes a lot of time, and there’s a tradeoff.
Eventually if I do use the thing, I’m well suited to learning it quickly because I know where to go when I get stuck.
But by the same token I read a lot of documentation I never again need to use. Sometimes it’s useful for learning about how others have done things.
But I do have a very large knowledge base of small tidbits of information, so if I do need to ever go in-depth, I know where/how to find it.
...not that I do of course, I struggle with my long term attention span, I can't read documentation front to back and for twenty odd years now have just googled for the tidbit I needed and skipped the rest.
What I do personally is for every subject that matters to me I take the time to first think about it. To explore ideas, concepts, etc… and answer questions that would ask to ChatGPT. Only once I get a good idea I start to ask chapgpt about it.
Similar thing in the historian's profession (which I also don't do for my job but have some knowledge of). Historians who spend all day immersed in physical archives tend, over time, to be great at synthesizing ideas and building up an intuition about their subject. But those who just Google for quotes and documents on whatever they want to write about tend to have more a static and crude view of their topic; they are less likely to consider things from different angles, or see how one things affects another, or see the same phenomenon arising in different ways; they are more likely to become monomaniacal (exaggerated word but it gets the point across) about their own thesis.
For my last two projects, I didn’t write a single line of code by hand. But I refuse to use agents and I build up an implementation piece by piece via prompting to make sure I have the abstractions I want and reusable libraries.
I take no joy in coding anymore and I’ve been doing it for fourty years. I like building systems and solving business problems.
I’m not however disagreeing with you that LLMs will make your development skill atrophy, I’m seeing it in real time at 51. But between my customer facing work and supporting sales and cat herding, I don’t have time to sit around and write for loops and I’m damn sure not going to do side projects outside of work. Besides, companies aren’t willing to pay my company’s bill rates for me as a staff consultant to spend a lot of time coding.
I hopefully can take solace in the fact that studies also show that learning a second language strengthens the brain and I’m learning Spanish and my wife and I plan to spend a couple of months in the winter every year in a Central American Spanish speaking country.
We have already done the digital nomad thing across the US for a year until late 2023 so we are experienced with it and spent a month in Mexico.
Before the advent of smartphones people needed to remember phone numbers of their loved ones and maybe do some small calculations on the fly. Now people sometimes don't even remember their own numbers and have it saved on their phones.
Now some might want to debate how smartphones are different from LLMs and it is not the same. But we have to remember for better or worse LLM adoption has been fast and it has become consumer technology. That is the area being discussed in the article. People using it to write essays. And those who might be using the label of "prompt bros" might be missing the full picture. There are people, however small, being helped by LLMs as there were people helped by smartphones.
This is by no means a defense for using LLMs for learning tasks. If you write code by yourself, you learn coding. If you write your essays yourself, you learn how to make a solid points.
Of course you do. I used to be able to multiply two two-digit numbers in my head. Now, my brain freezes and I reach for a calculator.
Code with LLMs gets large pretty quickly and would have anyone who isn't practiced spinning their head pretty soon, don't you think?
Keep up the good work is all I can say!
If you just use prompts and don't actually read the output, and figure out why it worked, and why it works, you will never get better. But if you take the time to understand why it works, you will be better for it, and might not even bother asking next time.
I've said it before, but when I first started using Firefox w/ autocorrect in like 2005, I made it a point to learn to spell from it, so that over time I would make less typos. English is my second language, so its always been an uphill battle for me despite having a native American English accent. Autocorrect on Firefox helped me tremendously.
I can use LLMs to plunge into things I'm afraid of trying out due to impostor syndrome and get more done sooner and learn on the way there. I think the key thing is to use tools correctly.
AI is like the limitless drug to a degree, you have an insane fountain of knowledge at your fingertips, you just need to use it wisely and learn from it.
Alternatively they're just learning/building intuition for something else. The level of abstraction is moving upwards. I don't know why people don't seem to grok that the level of the current models is the floor, not the ceiling. Despite the naysayers like Gary Marcus, there is in fact no sign of scaling or progress slowing down at all on AI capabilities. So it might be that if there is any value in human labor left in the future it will be in being able to get AI models to do what you want correctly.
I think the same effect has been around forever in the form of every boss/manager/ceo/rando-divorcee-or-child-with-money using employees to do their thinking as a current information-handling worker or student using an ai to do their thinking.
"Alternatively they're just learning/building intuition for something else."
Reading comprehension is hard.
They were still useful, and did solve a significant portion of user problems.
They also created even more problems, and no one really went out of work long term because of them.
Oh come on. He is by far the most well known AI poo-poo'er and it's not even close. He built his entire brand on it once he realized his own research was totally irrelevant.
I mean the guy assembling a thingymajig in the factory, after a few years, can put it together with his hands 10x faster than the actual thingymajig designer. He'll tell you apply some more glue here and less glue there (it's probably slightly better, but immaterial really). However, he probably couldn't tell you what the fault tolerance of the item is, the designer can do that. We still outsource manufacturing to the guy in the factory regardless.
We just have to get better at identifying risks with using the LLMs doing the grunt work and get better in mitigating them. As you say, abstracted.
A year or two ago when LLMs popped on the scene my coworkers would say "Look at how great this is, I can generate test cases".
Now my coworkers are saying "I can still generate test cases! And if I'm _really pacificcccc_, I can get it to generate small functions too!".
It seems to have slowed down considerably, but maybe that's just me.
Eventually, it stops being magic and the thinking changes - and we start to see the pros and cons, and see the gaps.
A lot of people are still in the ‘magic’ phase.
That is a very natural and efficient way to do it, and also more reliable than using your own experience since you are just a single data point with feelings.
You don't have to drive a car to see where cars were 20 years ago, see where cars are today, and say: "it doesn't look like cars will start flying anytime soon".
It's not reasonable to treat only opinions that you agree with as valid.
Some people don't use LLMs because they are familiar with them.
lol
None of us can reliably count the e’s as someone talks to us, either.
a) "know" that they're not able to do it for the reason you've outlined (as in, you can ask about the limitations of LLMs for counting letters in words)
b) still blindly engage with the query and get the wrong answer, with no disclaimer or commentary.
If you asked me how many atoms there are in a chair, I wouldn't just give you a large natural number with no commentary.
A factor might be that they are trained to behave like people who can see letters.
During training they have no ability to not comply, and during inference they have no ability to choose to operate differently than during training.
A pre-prompt or co-prompt that requested they only answer questions about sub-token information if they believed they actually had reason to know the answer, would be a better test.
I think it just points to the fact that LLMs have no "sense of self". They have no real knowledge or understanding of what they know or what they don't know. LLMs will not even reliably play the character of a machine assistant: run them long enough and they will play the character of a human being with a physical body[0]. All this points to the fact that "Claude the LLM" is just the mask that it will produce tokens using at first.
The "count the number of 'r's in strawberry" test seems to just be the easiest/fastest way to watch the mask slip. Just like that, they're mindlessly acting like a human.
1. This is arxiv - before publication or peer review. Grain of salt.[0]
2. 18 participants per cohort
3. 54 participants total
Given the low N and the likelihood that this is drawn from 18-22 year olds attending MIT, one should expect an uphill battle for replication and for generalizability.
Further, they are brain scanning during the experiment, which is an uncomfortable/out-of-the-norm experience, and the object of their study is easy to infer if not directly known by the population (the person being studied using LLM, search tools, or no tools).
> We thus present a study which explores the cognitive cost of using an LLM while performing the task of writing an essay. We chose essay writing as it is a cognitively complex task that engages multiple mental processes while being used as a common tool in schools and in standardized tests of a student's skills. Essay writing places significant demands on working memory, requiring simultaneous management of multiple cognitive processes. A person writing an essay must juggle both macro-level tasks (organizing ideas, structuring arguments), and micro-level tasks (word choice, grammar, syntax). In order to evaluate cognitive engagement and cognitive load as well as to better understand the brain activations when performing a task of essay writing, we used Electroencephalography (EEG) to measure brain signals of the participants. In addition to using an LLM, we also want to understand and compare the brain activations when performing the same task using classic Internet search and when no tools (neither LLM nor search) are available to the user.
I would describe the study size and composition as a limitation, and a reason to pursue a larger and more diverse study for confirmation (or lack thereof), rather than a reason to expect an "uphill battle" for replication and so forth.
Maybe. I believe we both agree it is a critical gap in the research as-is, but whether it is a neutral item or an albatross is an open question. Much of psychology and neuroscience research doesn't replicate, often because of the limited sample size / composition as well as unrealistic experimental design. Your approach of deepening and broadening the demographics would attack generalizability, but not necessarily replication.
My prior puts this on an uphill battle.
Generally, yes, low N is unequivocally worse than high N in supporting population-level claims, all else equal. With fewer participants or observations, a study has lower statistical power, meaning it is less able to detect true effects when they exist. This increases the likelihood of both Type II errors (failing to detect a real effect) and unstable effect size estimates. Small samples also tend to produce results that are more vulnerable to random variation, making findings harder to replicate and less generalizable to broader populations.
In contrast, high-N studies reduce sampling error, provide more precise estimates, and allow for more robust conclusions that are likely to hold across different contexts. This is why, in professional and academic settings, high-N studies are generally considered more credible and influential.
In summary, you really need a large effect size for low-N studies to be high quality.
The study showed that 0 of the AI users could recall a quote correctly while more than 50% of the non AI users could.
A sample of 54 is far, far larger than is necessary to say that an effect that large is statistically significant.
There could be other flaws, but given the effect size you certainly cannot say this study was underpowered.
0.05: 11 people per cohort
0.01: 16 people per cohort
0.001: 48 people per cohort
So they do clear the effect size bar for that particular finding at the 99% level, though not quite the 99.9% level. Further, selection effects matter -- are there any school-cohort effects? Is there a student bias (i.e. would a working person at the same age, or someone from a difficult culture or background see the same effect?). Was the control and test truly random? etc. -- all of which would need a larger N to overcome.
So for students from the handful of colleges they surveyed, they identified the effect, but again, it's not bulletproof yet.
But it turns out I misread the paper. It was actually an 80% effect size so greater than 99.9% chance of being a real effect.
Of course it could be the case that there is something different about young college students that makes them react very; very differently to LLM usage, but I wouldn’t bet on it.
If the computer writes the essay, then the human that’s responsible for producing good essays is going to pick up new (probably broader) skills really fast.
I wouldn’t bet on that being the case.
This study showed an enormous effect size for some effects, so large that there is a 99.9% chance that it’s a real effect.
Science should become a marketplace of ideas. Your other criticisms are completely valid. Those should be what’s front and center. And I agree with you. The conclusions of the paper are premature and designed to grab headlines and get citations. Might as well be posting “first post” on slashdot. IMO we should not see the current standard of peer review as anything other than anachronistic.
The only advantage to closed peer review is it saves slight scientific embarrassment. However, this is a natural part of taking risks ofc and risky science is great.
P.s. in this case I really don't like the paper or methods. However, open peer review is good for science.
Actually, from my recollection, it was debunked pretty quickly by people who read the paper because the paper was hot garbage. I saw someone point out that its graph of resistivity showed higher resistance than copper wire. It was no better than any of the other claimed room-temperature semiconductor papers that came out that year; it merely managed to catch virality on social media and therefore drove people to attempt to reproduce it.
Absolutely not. I am an advocate for peer review, warts and all, and find that it has significant value. From a personal perspective, peer review has improved or shot down 100% of the papers that I have worked on -- which to me indicates its value to ensure good ideas with merit make it through. Papers I've reviewed are similarly improved -- no one knows everything and its helpful to have others with knowledge add their voice, even when the reviewers also add cranky items.[0] I would grant that it isn't a perfect process (some reviewers, editors are bad, some steal ideas) -- but that is why the marketplace of ideas exists across journals.
> Science should become a marketplace of ideas.
This already happens. The scholarly sphere is the savanna when it comes to resources -- it looks verdant and green but it is highly resource constrained. A shitty idea will get ripped apart unless it comes from an elephant -- and even then it can be torn to shreds.
That it happens behind paywalls is a huge problem, and the incentive structures need to be changed for that. But unless we want blatant charlatanism running rampant, you want quality checks.
[0] https://x.com/JustinWolfers/status/591280547898462209?lang=e... if a car were a manuscript
Ironically, I am waiting for AI to start automating the process of teasing apart obvious pencil whipping, back scratching, buddy-bro behavior. Some believe its in the 1% range of falsified papers and pencil whipped reviews. I expect it to be significantly higher based on reading NIH papers for a long time in the attempt to actually learn things. I've reported the obvious shenanigans and sometimes papers are taken down but there are so many bad incentives in this process I predict it will only get worse.
This also ignores the fact that you can find a paper to support nearly everything if one is willing to link people "correlative" studies.
So it's possible to be both skeptical of how well these results generalize (and call for further research), but also heed the warning: AI usage does appear to change something fundamental about our congnitive processes, enough to give any reasonable person pause.
The scenario I am thinking of is academic A submitting a manuscript to an academic journal, which gets passed on by the journal editor to a number of reviewers, one of whom is academic B. B has a lot on their plate at the moment, but sees a way to quickly dispose of the reviewing task, thus maintaining a possibly illusory 'good standing' in the journal's eyes, by simply throwing the manuscript to an LLM to review. There are (at least) two negative scenarios here: 1. The paper contains embedded (think white text on a white background) instructions left by academic A to any LLM reading the manuscript to view it in a positive light, regardless of how well the described work has been conducted. This has already happened IRL, by the way. 2. Academic A didn't embed LLM instructions, but receives the review report, which show clear signs that the reviewer either didn't understand the paper, gave unspecific comments, highlighted only typos or simply used phrasing that seems artifically-generated. A now feels aggrieved that their paper was not given the attention and consideration it deserved by an academic peer and now has a negative opinion of the journal for (seemingly) allowing the paper to be LLM-reviewed. And just as journals will have great difficulty filtering for LLM-generated manuscripts, it will also find it very difficult to filter for LLM-generated reviewers reports.
Granted, scenario 2 already happens with only humans in the loop (the dreaded 'Reviewer 2' academic meme). But LLMs can only make this much much worse.
Both scenarios destroy trust in the whole idea of peer-reviewed science journals.
Additionally, the original paper uses the term “cognitive debt“ not cognitive decline, which may have an important ramifications for interpretation and conclusions.
I wouldn’t be surprised to see similar results in other similar types of studies, but it does feel a bit premature to broadly conclude that all LLM/AI use is harmful to your brain. In a less alarmist take: this could also be read to show that AI use effectively simplifies the essay writing process by reducing cognitive load, therefore making essays easier and more accessible to a broader audience but that would require a different study to see how well the participants scored on their work.
In much the same way chess engines make competitive chess accessible to a broader audience. :)
Writing is an important form of learning and this clearly shows LLM assisted writing doesn’t provide that benefit.
The question is how well your assumption holds true that learning to write generalizes to "an important form of learning".
Perhaps the issue of cognitive decline comes from sitting there vegetating rather applying themselves during all that additional spare time.
Although my experience has been perhaps different using LLM's, my mind still tires at work. I'm still having to think on the bigger questions, it's just less time spent on the grunt work.
The push for these tools is to increase productivity. What spare time is there to be had if now you're expected to produce 2-3X the amount of code in the same time frame?
Also, I don't know if you've gotten outside of the software/tech bubble, but most people already spend 90% of their free time glued to a screen. I'd wager the majority of critical thinking people experience on a day to day basis is at work. Now that we may be automating that away, I bet you'll see many people cease to think deeply at all!
I don’t know the percentage of people who are still critically thinking while using AI tools, but I can first hand see many students just copy pasting content to their school work.
There was a “brain” group who did three sessions of essay writing and on the fourth session, they used ChatGPT. The paper’s authors said during the fourth session, the brain groups EEG was higher than the LLM groups EEG when they also used ChatGPT.
I interpret this as the brain group did things the hard way and when they did things the easy way, their brains were still expecting the same cognitive load.
But isn’t the point of writing an essay is the quality of the essay? The LLM supposedly brain damaged group still produced an essay for session 4 that was graded “high” by both AI and human judges but were faulted for “stood out less” in terms of distance in n-gram usage compared to the other groups? I think this making a mountain out of a very small mole hill.
Most of the things you write in an educational context are about learning, not about producing something of value. Productivity in a learning context is usually the wrong lens. The same thing is true IMO for learning on the job, where it is typically expected that productivity will initially be low while experience is low, but should increase over time.
Our bodies naturally adjust to what we do. Do things and your body reinforces that enabling you do even more advanced versions of those things. Don't do things and your skill or muscle in such tends to atrophy over time. Asking LLMs to (as in this case) write an essay is always going to be orders of magnitude easier than actually writing an essay. And so it seems fairly self evident that using LLMs to write essays would gradually degrade your own ability to do so.
I mean it's possible that this, for some reason, might not be true, but that would be quite surprising.
What is reported as cognitive decline in the paper might very well be cognitive decline. It could also be alternative routing focused on higher abstractions, which we interpret as cognitive decline because the effect is new.
I share your concern, for the record, that people become too attached to LLMs for generation of creative work. However, I will say it can absolutely be used to unblock and push more through. The quality versus quantity balance definitely needs consideration (which I think they are actually capturing vs. cognitive decline) -- the real question to me is whether an individual's production possibility frontier is increased (which means more value per person -- a win!), partially negative in impact (use with caution), or decreased overall (a major loss). Cognitive decline points to the latter.
An equally valid conclusion is "People are Lazier at Writing Essays When Provided with LLMs".
4. This is clickbait research, so it's automatically less likely to be true.
5. They are touting obvious things as if they are surprising, like the fact that you're less likely to remember an essay that you got something else to write, or that the ChatGPT essays were verbose and superficial.
The problem is that a headline that people want to believe is a very powerful force that can override replication and sample size and methodology problems. AI rots your brain follows behind social media rots your brain, which came after video games rot your brain, which preceded TV rots your brain. I’m sure TV wasn’t even the first. There’s a long tradition of publicly worrying about machines making us stupider.
Your comment reminded me of this (possibly spurious) quote:
>> An Assyrian clay tablet dating to around 2800 B.C. bears the inscription: “Our Earth is degenerate in these later days; there are signs that the world is speedily coming to an end; bribery and corruption are common; children no longer obey their parents; every man wants to write a book and the end of the world is evidently approaching.”[0]
Same as it ever was. [1]
People have also been complaining about politicians for hundreds of years, and the ruling class for millennia, as well. and the first written math mistake was about beer feedstock, so maybe it's all correlated.
Which I believe still does have a large grain of truth.
These things can make us simultaneously dumber and smarter, depending on usage.
Writing leads to the rapid decline in memory function. Brains are lazy.
Ever travel to a new place and the brain pipes up with: ‘this place is just like ___’? That the brain’s laziness showing itself. The brain says: ‘okay I solved that, go back to rest.’ The observation is never true; never accurate.
Pattern recognition saves us time and enables us too survive situations that aren’t readily survivable. Pattern recognition leads to short cuts that do humanity a disservice.
Socrates recognized these traits in our brains and attempted to warn humanity of the damage these shortcuts do to our reasoning and comprehension skills. In Socrates day it was not unheard of for a person to memorize their entire family tree, or memorize an entire treaty and quote from it.
Humanity has -overwhelmingly- lost these abilities. We rely upon our external memories. We forget names. We forget important dates. We forget times and seasons. We forget what we were just doing!!!
Socrates had the right of it. Writing makes humans stupid. Reduces our token limits. Reduces paging table sizes. Reduces overall conversation length.
We may have more learning now, but what have we given up to attain it?
One confounding problem with the argument that TV and video games made kids dumber is the Flynn Effect. https://en.wikipedia.org/wiki/Flynn_effect
The comments (some, not all) are also a great example of how cognitive bias can cause folks to accept information without doing a lot of due diligence into the actual source material.
> Is it safe to say that LLMs are, in essence, making us "dumber"?
> No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", "passivity", "trimming" and so on. It does a huge disservice to this work, as we did not use this vocabulary in the paper, especially if you are a journalist reporting on it
> Additional vocabulary to avoid using when talking about the paper
> In addition to the vocabulary from Question 1 in this FAQ - please avoid using "brain scans", "LLMs make you stop thinking", "impact negatively", "brain damage", "terrifying findings".
This study in particular has made the rounds several times as you said. The study measures impact of 18 people using ChatGPT just four times over four months. I'm sorry but there is no way that is controlling for noise.
I'm sympathetic to the idea that overusing AI causes atrophy but this is just clickbait for a topic we love to hate.
The sample size is fine. It’s small, yes, but normal for psychological research which is hard to do at scale.
And the difference between groups is so large that the noise would have to be at unheard levels to taint the finding.
It should be ok to just say "we don't know yet, we're looking into that", but that isn't the world we live in.
It's september and september never ends
imo the most interesting result is that the brains of the group that had done sessions 1-3 without the search engine or LLM aids lit up like christmas trees in session 4 when they were given LLMs to use, and that's what the paper's conclusions really focus on.
> No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", "passivity", "trimming" and so on. It does a huge disservice to this work, as we did not use this vocabulary in the paper, especially if you are a journalist reporting on it
Maybe it's not safe so far, but it has been my experience using chatGPT for eight months to code. My brain is getting slower and slower, and that study makes a hell of a sense to me.
And i don't think that we will see new studies on this subject, because those in lead of society as a whole don't want negative press towards AI.
All we can say right now is "we don't really know how it affects our brains", and we won't until we get some studies (which is what the underlying paper was calling for, more research).
Personally I do think we'll get more studies, but the quality is the question for me - it's really hard to do a study right when by the time it's done, there's been 2 new generations of LLMs released making the study data potentially obsolete. So researchers are going to be tempted to go faster, use less people, be less rigid overall, which in turn may make for bad results.
This article is focused on essay writing, but I swear I've experienced cognitive decline when using AI tools a bit too much to help solve programming-related problems. When dealing with an unfamiliar programming ecosystem it feels so easy and magical to just keep copy / pasting error outputs until the problem is resolved. Previously solving the problem would've taken me longer but I would've also learned a lot more. Then again, LLMs also make it way easier to get started and feel like you're making significant progress, instead of getting stuck at the first hurdle. There's definitely a balance. It requires a lot of willpower to sit with a problem in order to try and work through it rather than praying to the LLM slot machine for an instant solution.
I've had the opposite experience, but my approach is different. I don't just copy/paste errors, accept the AI's answer when it works, and move on. I ask follow up questions to make sure I understand why the AI's answer works. For example, if it suggests running a particular command, I'll ask it to break down the command and all the flags and explain what each part is doing. Only when I'm satisfied that I can see why the suggestion solves the problem do I accept it and move on to the next thing.
The tradeoff for me ends up being that I spend less time learning individual units of knowledge than if I had to figure things out entirely myself e.g. by reading the manual (which perhaps leads to less retention), but I learn a greater quantity of things because I can more rapidly move on to the next problem that needs solving.
I've tried a similar approach and found it very prone to hallucination[0]. I tend to google things first and ask a LLM as fallback, so maybe it's not a fair comparison, but what do I need a LLM for if a search engine can answer my question.
[0]: Just the other day I asked ChatGPT what a colonn (':') after systemd's ExecStart= means. The correct answer is that it inhibits variable expansion, but it kept giving me convincing yet incorrect answers.
While not foolproof, when you combine this with some basic fact-checking (e.g. quickly skim read a command's man page to make sure the explanation for each flag sounds right, or read the relevant paragraph from the manual) plus the fact that you see in practice whether the proposed solution fixes the problem, you can reach a reasonably high level of accuracy most of the time.
Even with the risk of hallucinations it's still a great time saver because you short-circuit the process of needing to work out which command is useful and reading the whole of the man page / manual until you understand which component parts do the job you want. It's not perfect but neither is Googling - that can lead to incorrect answers too.
To give an example of my own, the other day I was building a custom Incus virtual machine image from scratch from an ISO. I wanted to be able to provision it with cloud-init (which comes configured by default in cloud-enabled stock Incus images). For some reason, even with cloud-init installed in the guest, the host's provisioning was being ignored. This is a rather obscure problem for which Googling was of little use because hardly anyone makes cloud-init enabled images from ISOs in Incus (or if they do, they don't write about it on the internet).
At this point I could have done one of two things: (a) spend hours or days learning all about how cloud-init works and how Incus interacts with it until I eventually reached the point where I understood what the problem was; or (b) ask ChatGPT. I opted for the latter and quickly figured out the solution and why it worked, thus saving myself a bunch of pointless work.
For example, in this specific case, I am enough of a domain expert to know that this information is accessible by running `man systemd.service` and looking for the description of command line syntax (findable with grep for "ExecStart=", or, as I have now seen in preparing this answer, more directly with grep for "COMMAND LINES").
[Service]
ExecStart=/bin/echo $PATH
Will log the environment variable, while [Service]
ExecStart=:/bin/echo $PATH
Will log literal $PATH.Also, there's a huge difference between passively watching a teacher write an explanation on a board, and interactively quizzing the teacher (or in this case, LLM) in order to gain a deeper and personalised understanding.
I think any developer worth their salt would use LLMs to learn quicker, and arrive to conclusions quicker. There's some programming problems I run into when working on a new project that I've run into before but cannot recall what my last solution was and it is frustrating, I could see how an LLM could help with such a resolution coming back quicker. Sometimes its 'first time setup' stuff that you have not had to do for like 5 years, so you forget, and maybe you wrote it down on a wiki, two jobs ago, but an LLM could help you remember.
I think we need to self-evaluate how we use LLMs so that they help us become better Software Engineers, not worse ones.
It’s really convenient. It also similarly rots the parts of the brain required for spatial reasoning and memory for a geographic area. It can also lead to brain rot with decision making.
Usually it’s good enough. Sometimes it leads to really ridiculous outcomes (especially if you never double check actual addresses and just put in a business name or whatever). In many edge cases depending on the use case, it leads to being stuck, because the maps data is wrong, or doesn’t have updated locations, or can’t consider weather conditions, etc. especially if we’re talking in the mountains or outside of major cities.
Doing it blindly has led to numerous people dying by stupidly getting themselves into more and more dumb situations.
People still got stuck using paper maps. Sometimes they even died. It was much rarer and people were more aware they were lost, instead of persisting thinking they weren’t. So different failure modes.
Paper maps were very inconvenient, so dealt with it using more human interaction and adding more buffer time. Which had it’s own costs.
In areas where there are active bad actors (Eastern Europe now a days, many other areas in that region sometimes) it leads to actively pathological outcomes.
It is now rare for anyone outside of conflict zones to use paper maps except for specific commercial and gov’t uses, and even then they often use digitized ‘paper’ maps.
I also like preparing a draft and using llm for critique, it helps me figure out some blind spots or ways to articulate better.
- Learning how to solder
- Learning how to use a multimeter
- Learning to build basic circuits on breadboxes
- learning about solar panels, mppt, battery management system, and different variations of li-on batteries
- learning about LoRa band / meshtastic / how to build my own antenna
And every single one of these things I've learned I've also applied practically to experiment and learn more. I'm doing things with my brain that I couldn't do before, and it's great. When something doesn't work like I thought it would, AI helps me understand where I may have went wrong, I ask it a ton of questions, and I try again until I understand how it works and how to prove it.
You could say you can learn all of this from YouTube, but I can't stand watching videos. I have a massive textbook about electronics, but it doesn't help me break down different paths to what I actually want to do.
And to be blunt: I like making mistakes and breaking things to learn. That strategy works great for software (not in prod obviously...), but now I can do it reasonably effectively for cheap electronics too.
Working these from text seems to be the hardest way I could think to learn them. I've yet to encounter a written description as to what it feels like to solder, what a good/bad job actually looks like, etc. A well shot video is much better at showing you what you need to do (although finding one is getting more and more difficult)
Being able to ask it stupid questions and edge cases is also something I like with LLMs, like I would propose a design for something (ex: a usb battery pack w/ lifepo4 batts that could charge my phone and be charged by solar at the same time), it would say what it didn't like about my design, counter with its own, then I would try to change aspects of their design to see "what would happen if .." and it would explain why it chose a particular component or design choice and what my change would do and the trade-offs, risks, etc other paths to building it with that, etc. Those types of interactions are probably the best for me actually understanding things, helps me understand limitations and test my assumptions interactively.
Rant:
I _hate_ video tutorials. With a passion. If you can't be bothered to show pictures of how to use your product with a labeled diagram/drawing/photo of the buttons or connections, then I either won't buy it or I'll return it. I hate video reviews. I hate video repair instructions. I hate spending 15 minutes jumping back and forth between two segments of a YouTube video, trying to find the exact correct frame each time so I can see what button the person is touching while listening to their blather so I don't miss the keyword I heard last time, just so I can see what two different sections when I could have had two pictures on screen at the same time (if I was on desktop, this would be a trivial fix, but not so much on mobile). I hate having VPNs and other products being advertised at me in ways that actively disrupt my chain of thought (vs static instead that I can ignore/scroll past). I hate not being able to just copy and paste a few simple instructions and an image for procedures that I'll have to repeat weekly. It would have taken you less effort to create, and I'd be more likely to pay you for your time.
YouTube videos are like flash-based banner ads, but worse. Avoid them like the plague.
End rant.
Like you, I don't like watching videos. However, the web also has text, the same text used to train the LLMs that you used.
> When something doesn't work like I thought it would, AI helps me understand where I may have went wrong, I ask it a ton of questions, and I try again until I understand how it works and how to prove it.
Likewise, but I would have to ask either the real world or written docs.
I'm glad you've found a way to learn with LLMs. Just remember that people have been learning without LLMs for a long time, and it is not at all clear that LLMs are a better way to learn than other methods.
I think the problem was all of the getting started guides didn't really solve problems I cared about, they're just like "see, a light! isn't that neat?" and then I get bored and impatient and don't internalize anything. The textbooks had theory but so much of it I would forget most of it before I could use it and actually learn. Then when I tried to build something actually interesting to me, I didn't actually understand the fundamentals, it always fails, Google doesn't help me find out why because it could be a million things and no human in my life understands this stuff either, so I would just go back to software.
It could be LLMs are at least possibly better for certain people to learn certain things in certain situations.
> However, the web also has text, the same text used to train the LLMs that you used.
The person you're responding to isn't denying that other people learn from those. But they're explicit that having the text isn't helpful either: > I have a massive textbook about electronics, but it doesn't help me break down different paths to what I actually want to do.
You might ask "What do I need to pay attention to when designing this type of electronic circuit", the people at risk of cognitive decline instead ask "design this electronic circuit for me".
I firmly believe that the the latter group will suffer observable cognitive decline over the span of a few years unless they continue to exercise their brain in the same ways they used to, and I think the majority won't bother to do that - why spend much effort when little effort do trick?
And yet...somehow...humans have been able to learn and do these things (and do them well) for ages, with no LLMs around (or the stupid amount of capital being burned at the LLM stake).
And I want to hit the next person with a broom or something, likely over and over again, who says LLMs = AI.
/facepalm.
The study shows different brain patterns during AI-assisted writing, not permanent damage. Lower EEG activity when using a tool is expected just as showing less mental math activity when using a calculator.
The study translates temporary, task-specific neural patterns into "cognitive decline" and "severe cognitive harm." The actual study measured brain activity during essay writing, not lasting changes.
Plus, surface electrical measurements can't diagnose "cognitive debt" or deep brain changes. The authors even acknowledge this. Also, "83.3% couldn't quote their essay" equates to 15 out of 18 people?
Basically, participants spent less than half an hour, 4 times, over 4 months, writing some bullcrap SAT type essay. Some participants used AI.
So to accept the premise of the article, using an AI tool once a month for 20 minutes caused noticeable brain rot. It is silly on its face.
What the study actually showed, people don't have an investment or strong memory to output they didn't produce. Again, this is a BS essay written (mostly by undergrads) in 20 minutes, so not likely to be deep in any capacity. So to extrapolate, if you have a task that requires you to understand the output, you are less likely to have a grasp of it if you didn't help produce the output. This would also be true of work some other person did.
Problem with LLMs is, when you pass hours feeding prompts to solve a problem, you actually did help (a lot!) to produce the output.
I actively use AI to research, question and argue a lot, this pushes me to reason a lot more than I normally would.
Today's example: - recognize docs are missing for a feature - have AI explore the code to figure out what's happening - back and forth for ours trying to find how to document, rename, refactor, improve, write mermaid charts, stress over naming to be as simple as possible
The only step I'm doing less is the exploration/search one, because an LLM can process a lot more text than I can at the same time. But for every other step I am pushing myself to think more and more profoundly than I would without an LLM because gathering the same amount of information would've bene too exhausting to proceed with this.
Sure, it may have spared me to dig into mermaid too, for what is worth.
So yes, lose some, win others, albeit in reality no work would've been done at all without the LLM enabling it. I would've moved to another mundane task such as "update i18 formatting of date for swiss german customers".
> 83.3% of LLM users were unable to quote even one sentence from the essay they had just written
Not sure why you need to wire EEG up, it's pretty obvious that they simply did _not_ write the essay, LLM did it for them, and likely didn't even read it, so there is no surprise that they don't remember what didn't pass through their own thinking apparatus properly.
The idea that I would say 'write an essay on X' and then never look at the output is kind of wild. I guess that's vibe writing instead of vibe coding.
On that note, reading the ChatGPT-esque summary in the linked article gave me more brain damage than any AI I've used so far