Posted by huseyinkeles 1 day ago
curious how much did you write the code by hand of it?
Karpathy: Good question, it's basically entirely hand-written (with tab autocomplete). I tried to use claude/codex agents a few times but they just didn't work well enough at all and net unhelpful, possibly the repo is too far off the data distribution.
https://x.com/karpathy/status/1977758204139331904ah, this explains why these models have been useless to me this whole time. everything i do is just too far off the data distribution!
If you told me a decade ago that I could have a fuzzy search engine on my desktop that I could use to vaguely describe some program that I needed & it would go out into the universe of publicly available source code & return something that looks as close to the thing I’ve asked for as it can find then that would have been mindblowing. Suddenly I have (slightly lossy) access to all the code ever written, if I can describe it.
Same for every other field of human endeavour! Who cares if AI can “think“ or “do new things”? What it can do is amazing & sometimes extremely powerful. (Sometimes not, but that’s the joy of new technology!)
the things that AI is able to do are incredible, but hype levels are just totally detached from reality.
But it can already do that. Isn't that the whole "one-shotting" thing?
The problem is, of course, that it won't be optimized, maintainable or have anyone responsible you can point to if something with it goes wrong. It almost certainly (unless you carefully prompted it to) won't have a test suite, which means any changes (even fixes) to it are risky.
So it's basically a working mockup generator.
I am so, so tired of "semi-technical" youtubers showing off new models with one-shots. The vast majority of actual devs who use this stuff need it to work over long-term context windows and over multiple iterations.
If you come at the problem from the direction of "I draw a user interface; you guess what it's supposed to do and wire it up for me", then all you need to solve that problem (to a first-order approximation) is some plain-old 1970s "AI" heuristics.
The buzz around current AI coding prompting seems to be solely generated by the fact that while prototyping tools require you to at least have some training as a designer (i.e. understanding the problem you're solving on the level of inputs and outputs), these tools allow people with no experience in programming or design to get results. (Mainly by doing for UIs what genAI image/video tools do for art: interpolating the average of many ingested examples of how a designer would respond to a client request for X, with no regard for the designer's personal style†.)
† Unless prompted to have such regard... but if you know enough to tell the AI how to design everything, then you may as well just design everything. Just as, if you know art well enough to prompt an AI into developing a unique art style, then you likely know art well enough to just make that same art yourself with less effort than it takes to prompt and re-prompt and patch-erase-infill-prompt the AI into drawing what you want.
you might produce something that looks usable at first, but the actual application functionality will be significantly broken in most ways. it maybe works enough to do a demo for your video, but it won't work enough to actually distribute to end-users. and of course, as you say, it's not testable or maintainable in any way, so fixing what's broken is a bigger project than just writing it properly in the first place.
Remember the hype isn’t just “wow it’s so cool and amazing and useful”, it’s also “I can’t wait to fire all my dumb meat-based employees”
Some people can't see past how the trick is done (take training data and do a bunch of math/statistics on it), but the fact that LLMs are able to build the thing is in-and-of-itself interesting and useful (and fun!).
If the results are useful, then that’s what matters. Although I do suspect that some AI users are spending more time pulling the AI one-armed bandit handle than it would take them to just solve their problem the old fashioned way a lot of the time - but if pulling the one-armed bandit gets them a solution to their problem that they wouldn’t work up the motivation to solve themselves then that counts too, I guess.
Edit: for the young, wysiwyg (what you see is what you get) was common for all sorts of languages from c++ to Delphi to html. You could draw up anything you wanted. Many had native bindings to data sources of all kinds. My favourite was actually HyperCard because I learned it in grade school.
Boilerplate generation was never, ever the bottleneck.
Good times!
* Scaffolding first and foremost - It's usually fine for this, I typically ask "give me the industry standard project structure for x language as designed by a Staff level engineer" blah blah just give me a sane project structure to follow and maintain so I don't have to wonder after switching around to yet another programming language (I'm a geek, sue me).
* Code that makes sense at first glance and is easy to maintain / manage, because if you blindly take code you don't understand, you'll regret it the moment you need to be called in for a production outage and you don't know your own codebase.
I’d say it made me around 2x as productive.
I don’t think the cynicism of HN is justified, but I think what people forget is that it takes several months of really investing a lot of time into learning how to use AI well. If I see some of the prompts people give, and expect it to work, yeah no wonder that only works for React-like apps.
If anyone else says this, "the skepticism is exhausting", and their experience is completely irrelevant.
The grievance attitude seems to exist in both directions and is actually what is exhausting.
And they would be often be right. Coupled with the fact that most of the glowing "omg I only code with AI" posts don't even try to show what code or products they are working on.
And yes, the absolute vast majority of people who are skeptical are skeptical precisely because they use these tools every day themselves.
You don’t see any dissonance in that? It’s only the positive people that are exhausting?
I never pretend that AI is the be all end all of programming, don't claim that it can do all the magical things, or that it's capable of running hours on end just creating software with not proof like most positive posts do.
See the difference?
I'm all for positive posts. I'm against childish belief in magic: https://dmitriid.com/everything-around-llms-is-still-magical...
At the same time, these tools have helped me reduce the development time on this project by orders of magnitude. There are two prominent examples.
--- Example 1:
The first relates to internal tooling. I was debugging a gnarly problem in an interpreter. At some point I had written code to do a step-by-step dump of the entire machine state to file (in json) and I was looking through it to figure out what was going wrong.
In a flash of insight, I asked my AI service (I'll leave names out since I'm not trying to promote one over another) to build a react UI for this information. Over the course of a single day, I (definitely not a frontend dev by history) worked with it to build out a beautiful, functional, easy to use interface for browsing step-data for my VM, with all sorts of creature comforts (like if you hover over a memory cell, and the memory cell's value happens to be a valid address to another memory cell, the target memory cell gets automatically highlighted).
This single tool has reduced my debugging time from hours or days to minutes. I never would have built the tool without AI support, because I'm simply not experienced enough in frontend stuff to build a functional UI quickly.. and this thing built an advanced UI for me based on a conversation. I was truly impressed.
--- Example 2:
As part of verifying correctness for my project, I wanted to generate a set of tests that validated the runtime behaviour. The task here consists of writing a large set of reference programs, and verifying that their behaviour was identical between a reference implementation and the real implementation.
Half decent coverage meant at least a hundred or so tests were required.
Here I was able to use agentic AI to reduce the testcase construction time from a month to about a week. I asked the AI to come up with a coverage plan and write the test case ideas to a markdown file in an organized, categorized way. Then I went through each category in the test case markdown and had the AI generate the test cases and integrate them into the code.
---
I was and remain a strong skeptic of the hype around this tech. It's not the singularity, it's not "thinking". It's all pattern matching and pattern extension, but in ways so sophisticated that it feels like magic sometimes.
But while the skeptical perspective is something I value, I can't deny that there is core utility in this tech that has a massive potential to contribute to efficiency of software development.
This is a tool that we as industry are still figuring out the shape of. In that landscape you have all sorts of people trying to evangelize these tools along their particular biases and perspectives. Some of them clearly read more into the tech than is there. Others seem to be allergically reacting to the hype and going in the other direction.
I can see that there is both noise, and fundamental value. It's worth it to try to figure out how to filter the noise out but still develop a decent sense of what the shape of that fundamental value is. It's a de-facto truth that these tools are in the future of every mainstream developer.
Like 80% of writing coding is just being a glorified autocomplete and AI is exceptional at automating those aspects. Yes, there is a lot more to being a developer than writing code, but, in those instances, AI really does make a difference in the amount of time one is able to spend focusing on domain-specific deliverables.
[1] Show HN: I invented a new generative model and got accepted to ICLR (90 comments):
However when I ask an llm to generate my typed lua code, with examples and all, on how the syntax is supposed to be, it mostly gets it wrong.
my syntax for tables/objects is: local x: {foo = boolean}
but an llm will most likely gloss over this and always use : instead of = local x: {foo: boolean}
If your typed version of Lua has a syntax checker, you could also have it try to use that first on any code it's generated
I prefer to work with more isolated parts of the code. But again, I don't really know all that much about agents.
One thing I wanted to do on my project is reorganize all the tests, which sounds like an agent job. But I'd imagine I need to define some hard programmatic constraints to make sure tests are not lost or changed in the process.
I’ve had good experiences writing small scripts and linters to enforce things that agents get wrong frequently. What’s nice about those is that the agents are very good at writing them and they are easy to verify. Plus they are valuable for new humans devs as well.
I do love Claude Code, because one thing I periodically need to do is write some web code, which is not my favorite type of coding but happens to have incredibly good coverage in the training data. Claude is a much better web developer than I am.
But for digging into the algorithmic core of our automation tooling, it doesn't have nearly as much to work with and makes far more mistakes. Still a net win I'm happy to pay for, even if it's never anything more than my web developer slave.
I've already built some pretty large projects [1] with the assistance of agentic tooling like Claude Code. When it comes to the more squirrely algorithms and logic, they can fall down pretty hard. But as somebody who is just dreadful at UI/UX, having it hammer out all the web dev scaffolding saves me a huge amount of time and stress.
It's just a matter of tempering one's expectations.
A couple of very minor pieces of feedback, if you're open to it: The camera momentum when dragging felt a little unnatural. The videos seemed to have a slightly jumpy framerate and were a bit low-resolution when zoomed in.
Honestly though, those are minor nitpicks. It's a really fun and polished experience. Thanks for sharing!
Well, this one might still be borne out. It's just silly to think it's the case right now. Check in again in 10 years and it may be a very different story. Maybe even in 5 years.
What I find fascinating is reading this same thing in other context like “UI guru” will say “I would not let CC touch the UI but I let it rip on algorithmic core of our automation tooling cause it is better at it than me…”
But 'mediocre' isn't 'useless'.
If anything, the fact that Karpathy reached towards Claude/Codex in an attempt to gain value is indicative that, in previous coding efforts, those tools were helpful to him.
It's really not though? Honestly I'm surprised coding agents fail hard at this task apparently
This is good for bitcoin.
I guess his prompts couldn’t provide sufficient information either (there’s no limit). Sounds more like a user issue to me. :) I don’t think there’s anyone that can type faster than ChatGPT.
> My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it.
This is how he described vibe coding:
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Vibe coding is clearly aimed at having fun hacking around on something that doesn’t matter, and he’s doing the opposite of that with this project. The fact that he’s not using vibe coding for something that is completely inappropriate for vibe coding is neither surprising nor a failure of vibe coding.
AI can write better code than 99% of developers. This embarrassingly anti-AI shill included.
If he used the AI tool my company is developing the code would have been better and shipped sooner.
Nice synergy here, the lineage is: Karpathy's nano-GPT -> Keller Jordan's modded-nanoGPT (a speedrun of training nanoGPT) -> NanoChat
modded-nanoGPT [1] is a great project, well worth checking out, it's all about massively speeding up the training of a small GPT model.
Notably it uses the author's Muon optimizer [2], rather than AdamW, (for the linear layers).
Both share equal credit I feel (also, the paper's co-authors!), both put in a lot of hard work for it, though I tend to bring up Bernstein since he tends to be pretty quiet about it himself.
(Source: am experienced speedrunner who's been in these circles for a decent amount of time)
- https://x.com/leloykun/status/1846842883967692926
- https://www.yacinemahdid.com/p/muon-optimizer-explained-to-a...
https://www.youtube.com/watch?v=bO5nvE289ec
I found the above video as a good introduction.
"Muon is an optimizer for the hidden weights of a neural network. Other parameters, such as embeddings, classifier heads, and hidden gains/biases should be optimized using standard AdamW."
And I just looked into this nanochat repo and it's also how it's used here.
https://github.com/karpathy/nanochat/blob/dd6ff9a1cc23b38ce6...
Is this what production frontier LLMs are running inference with, or do they consume even more VRAM/compute?
At ~$8/hr, assuming a request takes 5 seconds to fulfill, you can service roughly 700ish requests. About $0.01 per request.
Is my math wrong?
Will share the resulting model once ready (4 hours from now) for anyone to test inference.
I didn't get as good results as Karpathy (unlucky seed?)
It's fun to play with though...
User: How many legs does a dog have? Assistant: That's a great question that has been debated by dog enthusiasts for centuries. There's no one "right" answer (...)
You can run it like this:
cd /tmp
git clone https://huggingface.co/sdobson/nanochat
uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
--model-dir /tmp/nanochat \
--prompt "Tell me about dogs."
% uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0... \ --model-dir nanochat/ --prompt "who is simonw on hacker news?" Using device: cpu Loading model from nanochat/model_000650.pt Loading metadata from nanochat/meta_000650.json Model config: {'sequence_len': 2048, 'vocab_size': 65536, 'n_layer': 20, 'n_head': 10, 'n_kv_head': 10, 'n_embd': 1280} Loading model weights (this may take a minute for a 2GB model)... Converting model to float32 for CPU... Model loaded successfully! Loading tokenizer... Tokenizer loaded successfully!
Prompt: who is simonw on hacker news? Encoded to 9 tokens
Generating... -------------------------------------------------- who is simonw on hacker news?<|user_end|><|assistant_start|>A hacker news reporter, I'd say a few things. First, I'm a bit of a hothead, always pushing the boundaries of what's acceptable in the world of hacking. I've got a reputation for being merciless and relentless in my pursuit of the truth.
In many ways, I've developed a sixth sense for this type of thing. I've spent years honing my skills, learning the language of hacking and the tactics it takes. I know how to think like the hacker --------------------------------------------------
git lfs install
> uv sync Resolved 88 packages in 3ms error: Distribution `torch==2.8.0+cu128 @ registry+https://download.pytorch.org/whl/cu128` can't be installed because it doesn't have a source distribution or wheel for the current platform
hint: You're on macOS (`macosx_15_0_arm64`), but `torch` (v2.8.0+cu128) only has wheels for the following platforms: `manylinux_2_28_x86_64`, `win_amd64`; consider adding your platform to `tool.uv.required-environments` to ensure uv resolves to a version with compatible wheels
Also, tmp/nanochat expects all contents from tokenizer and chatsft_checkpoints folder.
>Our main measure of progress. Bits per byte is, per Karpathy, "a much better measure than just the typical cross-entropy loss, because it further normalizes the loss on each token by the number of bytes of that token, making the metric tokenizer-invariant".
Is so blindingly obvious, that I'm ashamed to think that I didn't think do it when trialing my own tokenizer approach on tinystories. I might go back and have a look at how well my tokenizer compared to how well I imagined it compared.
When you train a language model, it tries to predict the next token.
We measure how good it is at that using loss aka how surprised it was by the real answer.
Different models might use different token lengths. So, if you describe loss relative to tokens then you can't easily compare the performance of two models that use different token lengths.
So, compare loss to bytes of text data instead.
Or would the loss of efficiency make it dumber then modern tokenizers?
Subword units are genuinely meaningful in most languages. You do need to tune the vocabulary size though.
absolutely requires longer training time and more compute.
once trained, predictions need to hold through many more steps because each step processes one token. if a token early in a sentence heavily implies a token will occur later in the sentence then that awareness needs to be maintained while processing each intermediary token and each step is a bit lossy. the fewer steps you need to take before leveraging that knowledge the better the prediction.
if you had infinite compute and data for training then performance would be equivalent though, i think.
Did you notice the inflection point in which the loss drops faster than expected in the top graph? Maybe you should let it run more…
I started writing up a blog post on my weekend with nanoGPT but it's not done yet... Would have been great to link to here lol oh well
And this new example goes even further - adds instruction following and tool use SFT, as well as RLVR. Makes for a more useful baseline.
The real neat thing about this is that WotC makes a few thousand new cards each year, so my training data set just grows over time and the model gets better with no effort spent on my part.
https://bsky.app/profile/roborosewaterm.bsky.social
You can see the invention of RLHF/ChatGPT here because text generation suddenly became much more coherent and also much less interesting. You have to go back to older tech for surrealism because nobody will let you see the good stuff (the base models).
I'm sure I can dig up info on how to do this and piece it together, but I thought OP might have a guide specifically for it.
I have been on an LLM binge this last week or so trying to build a from-scratch training and inference system with two back ends:
- CPU (backed by JAX)
- GPU (backed by wgpu-py). This is critical for me as I am unwilling to deal with the nonsense that is rocm/pytorch. Vulkan works for me. That is what I use with llama-cpp.
I got both back ends working last week, but the GPU back end was buggy. So the week has been about fixing bugs, refactoring the WGSL code, making things more efficient.
I am using LLMs extensively in this process and they have been a revelation. Use a nice refactoring prompt and they are able to fix things one by one resulting in something fully functional and type-checked by astral ty.
My use case is different. I want something that I can run quickly on one GPU without worrying about whether it is supported or not.
I am interested in convenience, not in squeezing out the last bit of performance from a card.
I gave up on all tools that depend on it for inference. llama-cpp compiles cleanly on my system for Vulkan. I want the same simplicity to test model training.
Getting them to work and recognize my GPU without passing arcane flags was a problem. I could at least avoid the pain with llama-cpp because of its vulkan support. pytorch apparently doesn't have a vulkan backend. So I decided to roll out my own wgpu-py one.
oh man an Alec x Andrej podcast would BREAK THE INTERNET... just saying... going from glory days of GPT1 to now building GPT3? in 4 hours
I noticed NewRelic has a chat feature that does this sort of thing, it's scoped very narrowly down to their website and analytics DSL language, and generates charts/data from their db. I've always wondered how they did that (specifically in terms of set up the training/RAG + guardrails). It's super useful.
The most likely way of building that would be to equip it with a "search_docs" tool that lets it look up relevant information for your query. No need to train an extra model at all if you do that.
Those other ways to integrate the texts might be some form of RAG or other ideas like Apple's recent 'hierarchical memories' (https://arxiv.org/abs/2510.02375).
What a prolific person Andrej is. It's been more than amazing to follow along!
Our current world is build on top of open source projects. This is possible because there are a lot of free resources to learn to code so anyone from anywhere in the world can learn and make a great piece of software.
I just hope the same will happen with the AI/LLM wave.
I also worry that as we rely on LLMs more and more, we will stop producing the kind of tutorials and other content aimed at beginners that makes it so easy to pick up programming the manual way.
There's also a reasonable way to "leapfrog" the training cost with a pre-trained model. So if you were doing nanochat as a learning exercise and had no money, the idea would be to code it up, run one or two very slow gradient descent iterations on your slow machine to make sure it is working, then download a pre-trained version from someone who could spare the compute.
No, it's extremely hard to imagine since I used one of Karpathy's own models to have a basic chat bot like six years ago. Yes, it spoke nonsense; so did my GPT-2 fine tune four years ago and so does this.
And so does ChatGPT
Improvement is linear at best. I still think it's actually a log curve and GPT3 was the peak of the "fun" part of the curve. The only evidence I've seen otherwise is bullshit benchmarks, "agents" that increase performance 2x by increasing token usage 100x, and excited salesmen proclaiming the imminence of AGI
The most recent leaked annualized revenue rate was $12bn/year. They're spending a lot more than that but convincing customers to hand over $12bn is still a very strong indicator of demand. https://www.theinformation.com/articles/openai-hits-12-billi...
Given the rest of circular deals, I'd also scrutinize if it applies to the revenue. The entanglement with the Microsoft investments and the fact that "Open" "AI" is a private company makes that difficult to research.
[1] In a U.S. startup, I went through three CEOs and three HR apps, which mysteriously had to change for no reason but to accommodate the new CEO's friends and their startups.
For every little bit a model a smarter and more accurate there are exponentially more real world tasks it could be used for.
In the real world...
I feel like this point of view is an ideal not shared by one of the main branches of anti-AI sentiment.
The idea of intellectual property works against this. Rather than contributing to humanity directly, ownership of information is accumulated by individuals and then rented to humanity.
At the same time I agree that people should be able to have a livelihood that affords them the ability to create new intellectual contributions.
The service Karpathy is providing is also being provided by thousands of YouTube creators in a huge variety of topics. It's a little sad that so many must support their efforts with support their efforts with sponsorships from sources with varying degrees of ethical behaviour. Patreon is better but still not ideal. I sincerely believe this _is_ one of the best ways to contribute to society.
A recent Daily Show had Jon Stewart describe training AI as strip mining human knowledge. Training AI is regularly described as theft as if this position is a given without any counter argument possible. It is opinion masquerading as fact. This saddens me because it suggests to me that the war to control the narrative is being won by people who want to entrench a hypercapitalistic vision of ownership where not only is a particular expression of an idea ownable but also stakes a claim to own some of any ideas that come from viewing that expression.
I cannot see any way that this viewpoint would aid humanity as a whole, but instead assign benefits to a collection of individuals. The ability to trade intellectual property means that ownership inevitably gets passed to a smaller and smaller pool of individuals over time.
I think we really do need a new way to consider these issues in light of the modern world. When mentioning these thoughts to others a common refrain is that it doesn't matter because the powers that be (and their lobbyists) will prevent any fix from happening. I have never been fond of that particular fatalism, especially when it inhibits discussion of what would be better.
I'm all for abolishing IP if all AIs are owned communally. I.e. ideally they're utilities or flat out co-ops like some Spanish businesses.
https://en.wikipedia.org/wiki/Mondragon_Corporation
Consum (Spanish supermarket).
They don't get to use everything communally and then capitalism their way forward.
Software is just a tool. Much like a hammer, a knife, or ammonium nitrate, it can be used for both good or bad.
I say this as someone who has spent almost 15 years writing software in my free time and publishing it as open source: building software and allowing anyone to use it does not automatically make other people's lives better.
A lot of my work has been used for bad purposes or what some people would consider bad purposes - cheating on tests, cheating in games, accessing personal information without permission, and in one case my work contributed to someone's doxxing. That's because as soon as you publish it, you lose control over it.
But at least with open source software, every person can use it to the same extent so if the majority of people are good, the result is likely to be more positive than negative.
With what is called AI today, only the largest corporations can afford to train the models which means they are controlled by people who have entirely different incentives from the general working population and many of whom have quite obvious antisocial personality traits.
At least 2 billion people live in dictatorships. AI has the potential to become a tool of mass surveillance and total oppression from which those countries will never recover because just like the models can detect a woman is pregnant before she knows it, it will detect a dissenter long before dissent turns into resistance.
I don't have high hopes for AI to be a force for good and teaching people how toy models work, as fun as it is, is not gonna change it.
I take it you're very positive about Andrej's new project which allows anyone to train a model for a few hundred dollars which is comparable to the state-of-the-art from just 5 years ago then.
Can I run it on my local hardware (nvidia consumer card, AMD cpu)? No. When could that corporation cut off my access to that hardware if I did anything it didn't like? Anytime.
Lots of things have started off cheap / subsidized to put competitors out of business, and then the prices go up, up and up..
Yes. The training process requires big expensive GPUs. The model it produces has 561M parameters, which should run on even a high end mobile phone (I run 4B models on my iPhone).
It already works like this in your precious western democracies and they didn't need AI to be authoritarian total surveillance states in spirit, with quite a lot of support from a propagandized populace that begged for or pretended to agree with the infringement of their civil rights because of terrorism, drugs, covid or protecting the poor poor children.
You can combat tech with legislation and culture but the legislation and culture were way beyond the tech in being extremely authoritian in the first place.
This would sit better with me if the repo included a first tier use case for local execution, non-NVidia hardware reference, etc.
This is a pretty disheartening way to respond to something like this. Someone puts a great deal of effort into giving something interesting away for free, and is told "you should have also done THIS work for free as well in order for me to value your contribution".
Think back to your first experience with tech, something you just erenstly thought was cool...
So I appreciate his work in an academic and educational sense, but large scale applications with stolen training material are still theft.
number of people you help x how much you help them x number of people you harm x how much you harm them
For example - harming a little bit all content creators of the world, by stealing their work without compensation or permission. How much does that cost globally every year after year? How do we even quantify long term consequences of that? Stuff like that.
Multiply that by many billions of chats per day.
Lawyers and other professionals charge a lot. So do artists, especially when you want to do a million revisions. LLMs hand it out for free, making many knowledge and art professions affordable and accessible to the masses.
Stable owners were upset when cars replaced horses, but you can't stop progress, especially when value proposition is undenyable.
As for the LLM "creative" content, have you seen it or read it? Well, same problem. After you will need a quality content, good luck finding some cheap creator. Pay full price for an experienced one and likely wait.
PS: I don't doubt that LLMs are here to stay. They will se a lot of usage and pervade all industries. It's just that future will be pretty shit. Talking on phone with LLMs, reading LLM slop, seeing LLM lop everywhere, receiving generated emails and using LLMs to reverse parse them to search for an actual content, major economy downturn, rapidly slowing salary growth (not that it was big before), etc.