Posted by samrolken 2 days ago
Am I wrong to think that the answer is obvious? I mean, who wants web apps to behave differently every time you interact with them?
You or your coworker are not a web app. You can do some of the things that web apps can, and many things that a web app can't, but neither is because of the modality.
Coded determinism is hard for many problems and I find it entirely plausible that it could turn out to be the wrong approach in software, that is designed to solve some level of complex problems more generally. Average humans are pretty great at solving a certain class of complex problems that we tried to tackle unsuccessfully with many millions lines of deterministic code, or simply have not had a handle on at all, like (like build a great software CEO).
Talk about a nonsensical non-sequitur, but I’ll bite. People want those to be deterministic too, to a large extent.
When people cook a meal with the same ingredients and the same times and processes (like parameters to a function), they expect it to taste about the same, they never expect to cook a pizza and take a salad out of the oven.
When they have sex, people expect to ejaculate and feel good, not have their intercourse morph into a drag race with a clown half-way though.
And when they want a “solution”, they want it to be reliable and trustworthy, not have it shit the bed unpredictably.
When products have limitations, those are usually acceptable to me if I know what they are or if I can find out what the breaking point is.
If the breaking point was me speaking a bit unclearly, I'd speak more clearly. If the breaking point was complex questions, I'd ask simpler ones. If the breaking point is truly random, I simply stop using the service because it's unpredictable and frustrating.
speak for yourself
Stellar description.
In the end, useful stuff is built by people caring about the details. This will always be true. I think in LLMs and broadly AI people see an escape valve from that where the thinking about the details can be taken off their hands, and that's appealing, but it won't work in exactly the same way that having a human take the details off your hands doesn't usually work that well unless you yourself understand the details to a large extent (not necessarily down to the atoms, but at the point of abstraction where it matters, which in software is mostly about deterministically how do the logic flows of the thing actually work and why).
I think a lot of people just don't intuit this. An illustrative analogy might be something else creative, like music. Imagine the conversation where you're writing a song and discussing some fine point of detail like the lyrics, should I have this or that line in there, and ask someone's opinion, and their answer is 'well listen, I don't really know about lyrics and all of that, but I know all that really matters in the end is the vibe of the song'. That contributes about the same level of usefulness as talking about how software users are ultimately looking for 'solutions' without talking about the details of said software.
Okay but when I start my car I want to drive it, not fuck it.
Every reason people prefer a car or bike over the bus is a reason non-deterministic agents are a bad interface.
And that analogy works as a glimpse into the future - we’re looking at a fast approaching world where LLMs are the interface to everything for most of us - except for the wealthy, who have access to more deterministic services or actual human agents. How long before the rich person car rental service is the only one with staff at the desk, and the cheaper options are all LLM based agents? Poor people ride the bus, rich people get to drive.
It has always seemed to me that workflow or processes need to be deterministic and not decided by an LLM.
This can be answered though, albeit imperfectly. On a more reductionist level, we are the cosmos experiencing itself. Now there are many ways to approach this. But just providing us with the right chemicals to feel pleasure/satisfaction is a step backwards. All the evolution of a human being, just to end up functionally like an amoeba or a bacteria.
So we need to retrace our steps backwards in this thought process.
I could write a long essay on this.
But, to exist in first place, and to keep existing against all the constraints of the universe, is already pretty fucking amazing.
Whether we do all the things we do, just in order to stay alive and keep existing, or if the point is to be the cosmos “experiencing itself”, is pretty much two sides of the same coin.
When you suddenly realize walking down the street that the very high fentanyl zombie is having a better day than you are.
Yeah, you can push the button in your brain that says "You won the game." However, all those buttons were there so you would self-replicate energy efficient compute. Your brain runs on 10 watts after all. It's going to take a while for AI to get there, especially without the capability for efficient self-repair.
In one scenario every atom's trajectory was destined from the creation of time and we're just sitting in the passenger seat watching. In another, if we do have free will then we control the "real world" underneath - the quantum and particle realms - as if through a UI. In the pod scenario, we are just blobs experiencing chemical reactions through some kind of translation device - but aren't we the same in the other scenarios too?
If we were in an imagined world and you are headed to work
You either walk out your door and there is a self driving car, or you walk out of your door and there is a train waiting for you or you walk out of your door and there is a helicopter or you walk out of your door and there is a literal worm hole.
Let's say all take the same amount of time, are equally safe, same cost, have the same amenities inside, and "feel the same" - would you care if it were different every day?
I don't think I would.
Maybe the wormhole causes slight nausea ;)
In order to get to your destination, you need to explain where you want to go. Whatever you call that “imperative language”, in order to actually get the thing you want, you have to explain it. That’s an unavoidable aspect of interacting with anything that responds to commands, computer or not.
If the AI misunderstands those instructions and takes you to a slightly different place than you want to go, that’s a huge problem. But it’s bound to happen if you’re writing machine instructions in a natural language like English and in an environment where the same instructions aren’t consistently or deterministically interpreted. It’s even more likely if the destination or task is particularly difficult/complex to explain at the desired level of detail.
There’s a certain irreducible level of complexity involved in directing and translating a user’s intent into machine output simply and reliably that people keep trying to “solve”, but the issue keeps reasserting itself generation after generation. COBOL was “plain english” and people assumed it would make interacting with computers like giving instructions to another employee over half a century ago.
The primary difficulty is not the language used to articulate intent, the primary difficulty is articulating intent.
The specific events that follow when asking a taxi driver where to go may not be exactly repeatable, but reality enforces physical determinism that is not explicitly understood by probabilistic token predictors. If you drive into a wall you will obey deterministic laws of momentum. If you drive off a cliff you will obey deterministic laws of gravity. These are certainties, not high probabilities. A physical taxi cannot have a catastrophic instant change in implementation and have its wheels or engine disappear when it stops to pick you up. A human taxi driver cannot instantly swap their physical taxi for a submarine, they cannot swap new york with paris, they cannot pass through buildings… the real world has a physically determined option-space that symbolic token predictors don’t understand yet.
And the reason humans are good at interpreting human intent correctly is not just that we’ve had billions of years of training with direct access to physical reality, but because we all share the same basic structure of inbuilt assumptions and “training history”. When interacting with a machine, so many of those basic unstated shared assumptions are absent, which is why it takes more effort to explicitly articulate what it is exactly that you want.
We’re getting much better at getting machines to infer intent from plain english, but even if we created a machine which could perfectly interpret our intentions, that still doesn’t solve the issue of needing to explain what you want in enough detail to actually get it for most tasks. Moving from point A to point B is a pretty simple task to describe. Many tasks aren’t like that, and the complexity comes as much from explaining what it is you want as it does from the implementation.
it's not as far as your experience goes - you press pedal, it accelerates. You turn the steering, it goes the way it turns. What the car does is deterministic.
More importantly, it does this every time, and the amount of turning (or accelerating) is the same today as it was yesterday.
If an LLM interpreted those inputs, can you say with confidence, that you will accelerate in a way that you predicted? If that is the case, then i would be fine with an LLM interpreted input to drive. Otherwise, how do you know, for sure, that pressing the brakes will stop the car, before you hit somebody in front of you?
of course, you could argue that the input is no longer your moving the brake pads etc - just name a destination and you get there, and that is suppose to be deterministic, as long as you describe your destination correctly. But is that where LLM is at today? or is that the imagined future of LLMs?
It's the same as having a function called "factorial" but you change the multiplication operation to addition instead.
Either way, these silly reductionist games aren't addressing the point: if I just want to get from A to B then I definitely want the absolute minimum of unpredictability in how I do it.
I wonder now, if everything is always different and suddenly every day would be the same. How many times as terrifying would that be compared to the opposite?
Your ancestors didn't want horses and carts, bicycles, shoes - they wanted the solutions of the day to the same scenarios above.
To dismiss the entire universe and its hostilities towards our existence and the workarounds we invent in response as mere means to an end rather than our essence is truly wild.
My point was that there is no true end goal as long as whims continue. The need to craft yet more means is equally endless. The crafting is the primary human experience, not the using. The using of a means inevitably becomes transparent and boring.
A job is better if your coworkers are of a caliber that they become a secondary family.
Are you suggesting that an average user would want to precisely describe in detail what they want, every single time, instead of clicking on a link that gives them what they want?
The average user may not care exactly what the mechanic does to fix your car, but they do expect things to be repeatable. If car repair LLMs function anything like coding LLMs, one request could result in an oil change, while a similar request could end up with an engine replacement.
One could imagine a hypothetical AI model that can do a pretty good job of understanding vague requests, properly refusing irrelevant requests (if you ask a mechanic to bake you a cake he'll likely tell you to go away), and behaving more or less consistently. It is acceptable for an AI-based backend to have a non-zero failure rate. If a mechanic was distracted or misheard you or was just feeling really spiteful, it's not inconceivable that he would replace your engine instead of changing your oil. The critical point is that this happens very, very rarely and 99.99% of the time he will change your oil correctly. Current LLMs have far too high of a failure rate to be useful, but having a failure rate at all is not a non-starter for being useful.
Even if it is possible, I’m not sure if we will ever have the compute power to run all or even a significant portion of the world’s computations through LLMs.
LLMs are, of course, bad. Or not good enough, at least. But suppose they are. Suppose they're perfect.
Would I rather use an app or just directly interface with an LLM? The LLM might be quicker and easier. I know, for example, ordering takeout is much faster if I just call and speak to a person.
Yes but the same LLM works very differently on each request. Even ignoring non-determinism, extremely minor differences in wording that a human mechanic wouldn’t even notice will lead to wildly different answers.
> LLMs are, of course, bad. Or not good enough, at least. But suppose they are. Suppose they're perfect.
You’re just talking about magic at that point.
But suppose the do become “perfect”, I’m skeptical we’ll ever have the compute resources to replace a significant fraction of computation with LLMs.
Websites are tools. Tools being non-deterministic can be a really big problem.
So even if it would be better to have more flexibility, most business won't want it.
I can speculate about what LLM-first software and businesses might look like and I find some of those speculations more attractive than what's currently on offer from existing companies.
The first one, which is already happening to some degree on large platforms like X, is LLM powered social media. Instead of having a human designed algorithm handle suggestions you hand it over to an LLM to decide but it could go further. It could handle customizing the look of the client app for each user, it could provide goal based suggestions or search so you could tell it what type of posts or accounts you're looking for or a reason you're looking for them e.g. "I want to learn ML and find a job in that field" and it gives you a list of users that are in that field, post frequent and high quality educational material, have demonstrated willingness to mentor and are currently not too busy to do so as well as a list of posts that serve as a good starting point, etc.
The difference in functionality would be similar to the change from static websites to dynamic web apps. It adds even more interactivity to the page and broadens the scope of uses you can find for it.
And don't even try to claim there won't ever be any regression: Current LLM-based A.I. will 'happily' lie to you that they passed all tests -- because based on interactions in the past, it has.
knows where we’d end up?
On the other hand the logs might be a great read.
1. Outputting text (or, sometimes, images).
2. No long term storage except, rarely, closed-source "memory" implementations that just paste stuff into context without much user or LLM control.
This is a really neat glimpse of a future where LLMs can have much richer output and storage. I don't think this is interesting because you can recreate existing apps without coding... But I think it's really interesting as a view of a future with much richer, app-like responses from LLMs, and richer interactions — e.g. rather than needing to format everything as a question, the LLM could generate links that you click on to drill into more information on a subject, which end up querying the LLM itself! And similarly it can ad-hoc manage databases for memory+storage, etc etc.
LLM is just one model used in A.I. It's not a panacea.
For generating deterministic output, probably a combination of Neural Networks and Genetic Programming will be better. And probably also much more efficient, energy-wise.
For a typical user today’s software isn’t particularly deterministic. Auto updates mean your software is constantly changing under you.
LLMs being inherently non-deterministic means using this technology as the foundation of your UI will mean your UI is also non-deterministic. The changes that stem from that are NOT from any active participation of the authors/providers.
This opens a can of worms where there will always be a potential for the LLM to spit out extremely undesirable changes without anyone knowing. Maybe your bank app one day doesn't let you access your money. This is a danger inherent and fundamental to LLMs.
Consider this: the bank teller is non-deterministic, too. They could give you 500 dollars of someone else's money. But they don't, generally.
It will be difficult to incorporate relative access or restrictions to features with respect to users current/known state or actions. Might as well write the entire web app at that point.
I think, if we can efficiently capture a way to "make" LLMs conform to a set of processes, you can cut out the app and just let the LLM do it. I don't think this makes any sense for maybe the next decade, but perhaps at some point it will. And, in such time, software engineering will no longer exist.
The LLM example gives you a completely different UI on _every_ page load.
That’s very different from companies moving around buttons occasionally and rarely doing full redesigns
Not quite the case. Temperature 0 is not the same as random seed. Also there are downsides to lowering temperature (always choosing the most probable next token).
We absolutely should want developers to think.
This project would be the developer tool used to produce interactive tools for end users.
More practically, it just redefines the developer's position; the developer and end-user are both "users". So the developer doesn't need to think AND the user doesn't need to think.
Technically everyone, we stopped using static pages a while ago.
Imagine pages that can now show you e.g. infinitely customizable UI; or, more likely, extremely personalized ads.
Product owners were happy.
Until users came for us with pitchforks as they didn’t want stuff to change constantly.
We backed out to releasing on monthly cadence.
When I go to the dmv website to renew my license, I want it to renew my license every single time
Yeah, NO.
Maybe the browser should learn to talk back.
You could store the pages in the database and periodically generate a new version based on the current set of pages and the share of traffic they enjoy. You would get something that evolves and stabilizes in some niche. Have an innitial prompt like; "dinosaurs!" Then sit back and see the magic unfold.
The hard part (coming from this direction) is enshrining the translation of specific user intentions into deterministic outputs, as others here have already mentioned. The hard part when coming from the other direction (traditional web apps) is responding fluidly/flexibly, or resolving the variance in each user's ability to express their intent.
Stability/consistency could be introduced through traditional mechanisms: Encoded instructions systematically evaluated, or, via the LLMs language interface, intent-focusing mechanisms: through increasing the prompt length / hydrating the user request with additional context/intent: "use this UI, don't drop the db."
From where I'm sitting, LLMs provide a now modality for evaluating intent. How we act on that intent can be totally fluid, totally rigid, or, perhaps obviously, somewhere in-between.
Very provocative to see this near-maximum example of non-deterministic fluid intent interpretation>execution. Thanks, I hate how much I love it!
I thought this didn't work? You basically end up fitting your AI models to whatever is the internal evaluation method, and creating a good evaluation method most often ends up having a similar complexity as creating the initial AI model you wanted to train.
Why would I need programs with colors, buttons, actual UI ?
I am trying to imagine a future where file navigators don't even exist : "I want to see the photos I took while I was in vacations last year. Yes, can you remove that cloud ? Perfect, now send it to XXXX's computer and say something nice."
"Can you set some timers for my sport session, can you plan a pure body weight session ? Yes, that's perfect. Wait, actually, remove the jumping jacks."
"Can you produce a detroit style techno beat I feel like I want to dance."
"I feel life is pointless without a work, can you give me some tasks to achieve that would give me a feeling of fulfillment ?"
"Can you play an arcade style video game for me ?"
"Can you find me a mate for tonight ? Yes, I prefer black haired persons."
Better yet, why exercise -which is so repetitive- if we can create a machine that just does it for you, including the dopamine triggering, why play an arcade video game where we can create a machine that fires the neuron needed to produce the exact same level of a excitement than the best video game.
And why find mates when my robot can morph into any woman in the world, or better yet, the brain implants that trigger the exact same feelings than having sex and love.
Bleak, we are oversimplifying existence itself and it doesn't lead to a nice place.
"Make me happy"
"Make me happy"
"Make me happy"
We are already on this path for many-many years, certainly decades if not centuries, although availability was definitely spotty in the past.
It is also kind of impossible to hop off this train, while it is individually possible to reject any of these conveniences, in general they just become a part of life. Which is not necessarily a bad thing, but just different.
lol, that seems like a reference to the William Gibson quote "The future is already here, it's just unevenly distributed"
Contextualized to "web-apps," as you have; navigating a list maybe requires an interface. It would be fairly tedious to differentiate between, for example, the 30 pairs of pants your computer has shown you after you asked "help me buy some pants" without using a UI (ok maybe eye-tracking?).
How do you calculate for that?
Back in the 90s, Fuzzy Logic was thought to be the solution. In a way, yes, but only for niche/specialized purposes, and they still have to limit the variables being evaluated.
But then it will become a tradeoff of complexity vs longevity.
And why? There are reasonably well done, low maintenance, temperature balancing valves out there.
And they do typically last 20+ or more years.
As for repetitive tasks, you can just explain to your computer a "common procedure" ?
We just gloss over the details in these hypothetical irregular or abstract tasks because we imagine they would be done as we imagine them. We don’t have experience trying to tell the damn AI to not delete that cloud (which one exactly?) but the other one via a voice UI. Which would suck and be super irritating, btw.
We know how irritating it would be to turn the shower off/on, because we do that all the time.
No matter how capable the friend it, it's oftentimes easier to do a task directly in a UI rather than to have to verbalize it to someone else.
And there are people that unfortunately cannot speak.
Fortunately, there are solutions.
I want to add that I think you are missing my argument here. Devices that allow you to speak without speaking shall soon be available to us [0].
The important aspect of my position is to think about the relevance of "applications" and "programs" in the age of AI, and, as an exaggeration of what is shown in that post, I was wondering if in the end, UI is not bound to disappear.
> “That’s right,” I said, “or even worse, it could be perfect.”
-- William Gibson: The Gernsback Continuum
I realize it sounds inhuman, but so is working in enterprise IT! :)
Growing the food that a human eats, running the air conditioning for their home, powering their lights, fueling their car, charging their phone, and all the many many things necessary to keep a human alive and productive in the 21st century are a larger resource cost than almost any machine/system that performs the same work. From an efficiency perspective, automation is almost always the answer. The actual debate comes from the ethical perspective (the innate value of human life).
Even including our system of comfort like refrigerated blueberries in January and AC cooling a 40° C heat down to 25° C (but excluding car commutes, because please work from home or take public transit) the human is still far far more energy efficient in e.g. playing go then alpha-go. With LLMs this isn’t even close (and we can probably factor in that stupid car commute, because LLMs are just that inefficient).
To clarify, I was making a broad statement about automation in general. Running an automated loom is more efficient in every way that getting humans to weave cloth by hand. For most tasks, automation is more efficient.
However, there are tasks that humans can still do more efficiently than our current engines of automation. Go is a good example because humans are really good at it and it AlphaGo can only sometimes beat the top players despite massive training and inference costs.
On the other hand, I would dispute that LLMs fall into this category, at least for most tasks, because we have to factor in marginal setup costs too. I think that raising from infancy all of the humans needed to match the output speed of an LLM has a greater cost than training the LLM. Even if you include the cost of mining the metal and powering the factories necessary to build the machines that the LLMs run on. I'm not 100% confident in this statement, but I do think that it's much closer than you seem to think. Supporting the systems that support the systems that support humans takes a lot of resources.
To use your blueberries example, while the cost of keeping the blueberries cold isn't much, growing a single serving of blueberries requires around 95 liters of water[1]. In a similar vein, the efficiency of the human brain is almost irrelevant because the 20 watts of energy consumed by the brain is akin from a resource consumption perspective to the electricity consumed by the monitor to read out the LLM's output: it's the last step in the process, but without the resource-guzzling system behind it, it doesn't work. Just as the monitor doesn't work without the data center which doesn't work without electricity, your brain doesn't work without your body which doesn't work without food which doesn't get produced without water.
As sramam mentioned, these kinds of utilitarian calculations tend to seem pretty inhuman. However, most of the time, the calculations turn out in favor of automation. If they didn't, companies wouldn't be paying for automated systems (this logic doesn't apply to hype-based markets like AI. I'm talking more about markets that are stably automated like textile manufacturing). If you want an anti-automation argument, you'll have a better time arguing based on ethics instead of efficiency.
Again, thanks for the Go example. I genuinely didn't consider the tasks where humans are more efficient than automation.
[1]: https://watercalculator.org/water-footprint-of-food-guide/
Instead I would like to shift the focus on the benefits of LLM. I know the costs are high, very very very high, but you seem to think that the benefits are also so high measured in time saved. That is the amount of tasks automated are enough to save humans doing similar tasks by miles. If that is what you think I disagree. LLMs have yet to prove them selves with real world application. We are seeing when we actually do measure how much LLMs save work-hours, that it the effects are at best negligible (see e.g. https://news.ycombinator.com/item?id=44522772). Worse, generative AI is disrupting our systems in worse way, where e.g. teachers, peer-reviewers, etc. have to put in a bunch of extra work to verify that the submitted work was actually written by that person, and not simply generated by AI. Just last Friday I read that arXiv will no longer accept submissions unless they have been previously peer-reviewed because they are overwhelmed by AI generated submissions[1].
There are definitely technologies which have saved us time and created a much more efficient system then was previously possible. The loom is a great example of one, I would claim the railway is another, and even the digital calculator for sure. But LLMs, and generative AI more generally are not that. There may be utilities for this technology, but automation and energy/work savings is not one of them.
1: https://blog.arxiv.org/2025/10/31/attention-authors-updated-...
1. The human brain draws 12 - 20 watts [1, 2]. So, taking the lower end, a task taking one hour of our time costs 12 Wh.
2. An average ChatGPT query is between 0.34 Wh - 3 Wh. A long input query (10K tokens) can go up to 10 Wh. [3] I get the best results by carefully curating the context to be very tight, so optimal usage would be in the average range.
3. I have had cases where a single prompt has saved me at least an hour of work (e.g. https://news.ycombinator.com/item?id=44892576). Let's be pessimistic and say it takes 3 prompts at 3 Wh (9 Wh) and 10 minutes (2 Wh) of my time prompting and reviewing to complete a task. That is 11 Wh for the same task, which still beats out the human brain unassisted!
And that's leaving aside the recent case where I vibecoded and deployed a fully-tested endpoint on a cloud platform I had no prior experience in, over the course of 2 - 3 hours. I estimate it would have taken me a whole day just to catch up on the documentation and another 2 days tinkering with the tools, commands and code. That's at least an 8x power savings assuming an 8-hour workday!!
4. But let's talk data instead of anecdotes. If you do a wide search, there is a ton of empirical evidence that improves programmer productivity by 5 - 30% (with a lot of nuance). I've cited some here: https://news.ycombinator.com/item?id=45379452 -- there is no measure of the amount of prompt usage to estimate energy usage, but those are significant productivity boosts.
Even the METR study that appeared to show AI coding lowering productivity also showed that AI usage broadly increased in idle-time in users. That is, calendar time for task completion may have gone up, but that included a lot of idle time where people were doing no cognitive work at all. Someone should run the numbers, but maybe it resulted in lower power consumption!
---
But what about the training costs? Sure we've burned gazillions of GWh on training already, and the usual counterpoint is "what about the cost involved in evolution?" but let's assume we stopped training all models today. They will still serve all future prompts at the same power consumption rates discussed above.
However every new human will take 15 - 20 years of education to get to be a novice in a single domain, followed by many more years of experience to become proficient. We're comparing apples and blueberries here, but that's a LOT of energy to even start becoming productive, but a trained LLM is instantly productive in multiple domains forever.
My hunch is that if we do a critical analysis of amortized energy consumption, LLMs will probably beat out humans. If not already, soon with the rate of token costs plummeting all the time.
[1] https://psychology.stackexchange.com/questions/12385/how-muc...
[2] https://press.princeton.edu/ideas/is-the-human-brain-a-biolo...
[3] https://epoch.ai/gradient-updates/how-much-energy-does-chatg...
In your LLM coding example you have a human and an AI model collaborating on a single task, both spend some amount of energy (taking your assumptions at face value, compatible amount of energy) and produce a single outcome. In the go example it is easy to compare energy usage and the quality of the outcome is also easy to measure (simply who won the game). In your coding example the quality of the outcome is impossible to measure, and because the effort is collaborative, splitting the energy usage is complected.
When talking about automation my game of go example falls apart. A much better examples would be something like a loom, or a digital calculator. These tools help the human arrive at a particular outcome much faster and with much less effort then a human performing the task without the help of the machines. The time saved by using these tools are measured in several orders of magnitudes, and the energy spent is at par with a human. It is easy to see how a loom or a digital calculator are more efficient then a human.
I guess if we take into account the training cost of an LLM model we should also take into account the production costs of looms and digital calculators. I don‘t know how to do that, but I can’t imagine it would be anywhere close to that of an LLM model.
And we have an LLM model we have increased the productivity of, not 5000x[1], but by 5%-30%. To me this does not sound like a revolutionary technology. But I have my doubts of even the 5%-30% figure. We have preliminary research ranging anywhere from negative productivity increase to your cited 5%-30%. We will have to wait for more research, and possibly some meta-analysis before we can accurately assess the productivity boost of LLMs. But we will have to do a whole lot better then 5%-30% to sufficiently justify the huge energy consumption of AI[2].
Personally, I am not convinced by your back of the envelope calculations. It fails my sniff test that 9 Wh of matrix multiplication will consistently save you an hour of using your brain to perform the same task adequately. I know our brains are not super good at the logic required for coding (but neither are LLMs), but I know for a fact they are very efficient at it.
That said I refuse to accept your framing that we can simply ignore the energy used in training, on the bases that it is equally invalid as considering the energy used for evolving into our species, or that we can simply stop training new models and use the models we do have. That is simply not how things work. New models will get trained (unless the AI bubble bursts and the market looses interest) and the energy consumed by training is the bulk of the energy cost. And omitting it makes the case for AI comically easy to justify. I reject this framing.
Instead of calculating, instead I’m gonna do a thought experiment. Imagine a late 19th century where iron and steel production took an entire 2% of world’s energy consumption[3] (maybe an alternative reality where Iron working is simply that challenging and requires much higher temperatures to work). But the steam train could only carry the same load as a 20 mule team, and would only do it 5%-30% faster on average then the state of the art cargo carriages at the time without steam power. Would you accept the argument that we should simply ignore the fact that rail production takes a whopping 2% of global energy consumption, when factoring the energy consumption of the steam train, even when it only provides you with 5%-30% productivity boost. I don‘t think so.
---
1: I don‘t know how much the loom has increased productivity, but this is what I would guess without any way of knowing how to even find out.
2: That is, if you are only interested in the increased productivity. If you are interested in the LLM models for some other reason, those reason will have to be measured differently.
3: https://www.allaboutai.com/resources/ai-statistics/ai-enviro...
Efficiencies lead to less resources being used if your demand is constant, but if demand is elastic, it often leads to the total resource consumption increasing.
See also: Jevons Paradox (https://en.wikipedia.org/wiki/Jevons_paradox).
Just ask Elon about his efforts to fully automate Tesla production.
Same as A.I. Current LLM-based A.I.s are not at all as efficient as a human brain.
Multiply that by dozens or hundreds of self-updating programs on a typical machine. Absolutely insane amounts of resources.
Some ideas - use a slower 'design' model at startup to generate the initial app theme and DB schema and a 'fast' model for responses. I tried a version using PostREST so the logic was in entirely in the DB and but then it gets too complicated and either the design model failed to one-shot a valid schema or the fast model kept on generating invalid queries.
I also use some well known CSS libraries and remember previous pages to maintain some UI consistency.
It could be an interesting benchmark or "App Bench". How well can an LLM one-shot create a working application.
What if we ran AI locally and used it to actually do labor-intensive things with computers that make money rather than assuming everything were web-connected, paywalled, rate-limited, authenticated, tracked, and resold?
I don't see a point in using probabilistic methods to perform a deterministic logic. Even if it's output is correct, it's wasteful.
On the one hand, there’s „classical“ software that is developed here and deployed there — if you need a change, you need to go over to the developers, ask for a change & deploy, and thus get the change into your hands. The work of the developers might be LLM-assisted, but that doesn’t change the principle.
The other extreme is what has been described here, where the LLM provides the software „on the fly“.
What I‘m imagining is a software, deployed on a system and provided in the usual way — say, a web application for managing inventory.
Now, you use this software as usual.
However, you can also „meta-use“ the software, as in: you click a special button, which opens a chat interface to an LLM.
But the trick is, you don’t use the LLM to support your use case (as in „Dear LLM, please summarize the inventory“).
Instead, you ask the LLM to extend the software itself, as in: „Dear LLM, please add a function that allows me to export my inventory as CSV“.
The critical part is what happens behind the scenes: the LLM modifies the code, runs quality checks and tests, snapshots the database, applies migrations, and then switches you to a „preview“ of the new feature, on a fresh, dedicated instance, with a copy of all your data.
Once you are happy with the new feature (maybe after some more iterations), you can activate/deploy it for good.
I imagine this could be a promising strategy to turn users into power-users — but there is certainly quite some complexity involved to getting it right. For example, what if the application has multiple users, and two users want to change the application in parallel?
Nevertheless, shipping software together with an embedded virtual developer might be useful.