Corporate America Is Starting to Ration AI as Cost Skyrockets

Posted by 1vuio0pswjnm7 2 hours ago

Corporate America Is Starting to Ration AI as Cost Skyrockets(www.wsj.com)

81 points | 76 comments

tyingq 1 hour ago|

The abrupt swing in many non-technology company IT departments from "hey developer, you aren't using enough tokens" to this is just too funny.

And I'm seeing almost no self-awareness from leaders. They are making decisions about things that they just don't understand. And are completely unworried about it. Just blindly following whatever the news cycle is about AI.

datakan 1 hour ago||

The closer people live to the consequences of their decisions the more rational they become. Until leaders(and I use that term loosely) are held accountable, the insanity will continue.

greesil 55 minutes ago|||

Their only accountability is to the stock price. The insanity will continue.

oofbey 29 minutes ago||||

I’m sorry you are used to working with out of touch leadership. Not all companies are like that. Even big ones can have smart, empathetic leaders. Although very often money gets in the way of empathy.

qoez 1 hour ago|||

I feel like most successful businesses have such a moat of required capital to compete with them that even tho in theory poor decisions like this is supposed to give opportunities for entreprenuers to hit when the big dogs make a wrong move, it doesn't end up happening.

steve1977 17 minutes ago|||

That's nothing new though. It's just very obvious this time.

surgical_fire 4 minutes ago|||

I've never seen self-awareness from leaders. They always lead on vibes.

Understanding this was one of the most important things in my career.

sdeframond 51 minutes ago|||

Groups resist to change - the bigger the group, the most resistance there is.

As a leader, pushing for rapid change cannot really be nuanced lest the push dissipates into the organization's entropy.

HarHarVeryFunny 25 minutes ago||

Perhaps, but the change you get (if any) is most likely to be what you push for and reward/punish.

It's irrational to push for tokenmaxxing (literally "please increase our AI spending") and not expect that this is the result you are going to get. You won't get productivity increase, since that is not what you are pushing for - you will get token usage maximization (engineers running inane agentic tasks against your code base to increase usage, using company paid AI for their side projects, etc, etc).

SpicyLemonZest 17 minutes ago||

I'm not sure the leaders would disagree with what you're saying. They tokenmaxxed to understand what it looks like when AI gets into every corner of the business; now they feel they've gotten enough info (or at least that more info wouldn't be worth the cost), so they're adding in cost controls. As the article says, this is not great for AI model providers trying to predict what their future revenue is going to be, but it's not obvious that there's any mistake here for AI users.

vasco 15 minutes ago|||

During ZIRP they discovered that the way to lead companies nowadays is to become a maxxer of whatever current fad is, and the more you maxx the better. And then when things change and you're wrong, you'll be a strong leader and, in ZIRPs case fire everyone you over-hired, with AI will be similar.

Why be a normal guy that waits to see what happens and is measured and pragmatic when you can get attention basically through the whole cycle by being the earliest adopter, adopt it to the maxx, then also be the loudest big brain when the tide changes and be praised for "taking hard decisions" when you revert everything you said so far?

The fakemaxxing economy.

onlyrealcuzzo 1 hour ago||

The actual cost is going to drop 99% in ~4 years.

How much that makes it into enterprise pricing is TBD, since none of the hyper scalers are making money yet of selling AI inference.

Almost all businesses are ahead of the gun. For most of their use cases, AI is either not yet good enough on its own, or good enough but too expensive.

No one wants to get left behind, so everyone's trying to get onto it now, even though it's not ready for what most enterprises want to do with it.

It's easy for them to look at a small startup without billions of lines of legacy business logic debt and see them having success and wonder why they can't have just as much - or more - why they're bigger so they should have better and more success, right???

Wrong...

But when it gets ~99% cheaper for local inference over the next 4 years, at the same time the price per watt improve 4x -> a lot of those cases will start to pencil out.

BearOso 31 minutes ago|||

Going from Opus 4.5 to 4.7 secretly required 6x more compute to run. 4.8 is apparently 30% more on top. I haven't seen any optimizations lately aside from distillation. Nobody's optimizing, they're just scaling up.

trollbridge 7 minutes ago|||

DeepSeek and Alibaba would like to have a word.

rescbr 14 minutes ago|||

> Nobody's optimizing

The Chinese, since they lack computing hardware due to US export controls, are.

trollbridge 7 minutes ago||

And our export controls are going to turn China into a winner in the AI arms race if we're not careful.

krona 1 hour ago||||

> The actual cost is going to drop 99%

Do you mean the marginal cost by the producer, or the cost on the consumer? I can't see the price of electricity falling much, and the demand curve is apparently exponential if the hype is to be believed.

trollbridge 7 minutes ago||

DeepSeep V4 Pro is 99% cheaper than similarly performing models were 2 years ago (if such a model even existed).

Computing has always been about how to wring out more efficiency. The ENIAC was 150,000 watts, with 3 phase 240 volt power, and cost about $500,000.

My day to day laptop (a year old) is 35 watts, with 1 phase 20 volt power, and cost $1,000, so that's 99.98% less power consumption, 99.8% cheaper, and it has about 10 orders of magnitude more computing power, all on a time span of 80 years.

datakan 1 hour ago||||

What makes you think prices will drop? Everyone I’ve spoken to believes they will only skyrocket. Genuinely curious

onlyrealcuzzo 59 minutes ago||

The technology already exists now on the algorithmic front for the next 10x drop between everyone adopting DeepSeek's MLA, MoE (mostly already done), Medusa (a better version of Google's speculative decoding), Kimi's Attn Residuals, and Mimo's Sliding Window Attn, and (possibly) Microsoft's 1.58b (this may be a nothing burger).

Historic trends, every 18 months, performance for the same level of quality has gone down 90%.

See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...

And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...

And here: https://epoch.ai/data-insights/llm-inference-price-trends

Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.

Additionally, GRAM (from ~10 days ago) is likely to be a 5-10x on its own (if not substantially more for smaller models). It's unlikely within 4 years LeCun's JEPA ideas and similar ideas like GRAM applied to LLMs have ZERO impact. The preliminary results are absolutely astounding (5000x better reasoning - this is not peanuts).

Further, that's not even counting that cost per watt is still dropping ~2x every 2 years on its own on the hardware front.

If you look at the "cost" of inference. People think it's electricity - but it's currently almost ~80% hardware amortization. The memory shortage is not going to last, nor are Nvidia's ~80-90% margins.

The human brain is still 8-10 orders of magnitude more efficient than the best LLMs of today. With ~1/10th of global capex riding on AI, if you don't think they're going to knock of 2 orders of magnitude more, when it's this obvious and easy... I don't know what to tell you...

Sure, it might take 6 years instead of 4. My crystal ball isn't perfect.

HarHarVeryFunny 17 minutes ago|||

Sure, the price will some come a lot, even if we can argue about the timeline.

I think what will also happen, once we get past this current CEO AI FOMO mania, is that companies will start to look at AI spending more rationally like any other company expense, and will revert to more rational decision making.

Even if the cost comes down considerably over the next few years, that's plenty of time for companies to look at their financial results and question why AI expenditure isn't resulting in increase in revenue and/or profitability.

datakan 33 minutes ago||||

This is great food for thought, thank you

onlyrealcuzzo 18 minutes ago||

Additionally, on the context front -> all the labs are aware that for many tasks you can get 10x+ increases in output quality by feeding better context.

See https://arxiv.org/abs/2604.04364.

This won't really show up in benchmarks, but it will impact real world usage on the most common use cases.

I'm doing a study right now on the impacts of better context for small models to fix bugs.

A very dumb algorithm can make small models perform at 10x+ model sizes. I'll be surprised if it can't get to 20x+

rednb 19 minutes ago||||

I didn't take you seriously initially but after reading this, i think you are the real deal.

Thank you for sharing this and for having the intellectual courage to hold to a sound reasoning that may be unpopular initially.

Nimitz14 14 minutes ago|||

This is mostly slop. But you may be directionally correct

packetlost 1 hour ago||||

I don't see how this is even remotely true. Unless there's some super breakthrough into a fundamentally different architecture, there's not really a path to a 50% reduction in price, much less a 99% reduction.

onlyrealcuzzo 23 minutes ago||

And yet 90% drops for the same level of quality every 18 months have happened like clockwork...

And the technology already exists on the algorithmic front TODAY to lock in another 10x gain -> when, typically, algorithmic gains only account for ~30% of that drop and the other ~70% comes from better data (often synthetic) and knowledge distilation from frontier models.

Just look at DeepSeek's pricing...

bakugo 1 hour ago|||

Prices have been very obviously trending up, not down. Even open weights models are becoming more expensive with every release. Computer hardware is ballooning in price.

trollbridge 3 minutes ago|||

Grab a 5090 and run Qwen 3.6 35b on it (6 parameter seems to work best for me).

Then buy $10 (or $2, if you're cheap, and they take PayPal) of DeepSeek credits.

Whilst you're at it spring for a Claude subscription too and GPT.

Switch models between Qwen, DeepSeek Flash, DeepSeek Pro, and you can meet 99% of your code generation needs.

Hop over to Opus 4.7 (or 4.8, but I haven't really used it yet) and GPT-5.5 when doing very complex architecture/design or troubleshooting something where DeepSeek Pro is getting stuck.

It is ridiculous how cheap this stuff is now. It's affordable at third world prices.

onlyrealcuzzo 14 minutes ago||||

Prices are going up for BETTER quality -> not for the SAME level of quality.

People are willing to pay more for BETTER quality.

You obviously haven't seen DeepSeek v4 Pro's pricing if you think pricing only goes up...

abalashov 34 minutes ago|||

Just wait for the next model and the next model architecture. Just wait for it, bro.

amazingamazing 1 hour ago||

AI is overhyped. I have yet to see an end user product that in itself isnt a wrapper around LLMs that is impressive created by LLM assistance. I have also yet to see dramatic increases of revenue of companies using LLMs that don't involve selling things in its supply chain. Is it a nice affordance? Sure. 1T capex good? No.

If it was so good I would expect to see 2005-2015 advancements yearly.

Meanwhile China is blowing past the world with real improvements in the real world- solar, EVs, etc. meanwhile people keep making their fancy sans serif websites about todo apps, faster than ever before. Useless.

criddell 46 minutes ago||

> I have yet to see an end user product that in itself isnt a wrapper around LLMs that is impressive created by LLM assistance.

I don’t disagree that AI is overhyped. But I think you are probably looking in the wrong place.

I think most software that is written isn’t really a product, at least not a public product. It’s an in-house tool or a one-off project needed to complete some larger task. People everywhere are always writing small programs that make their life or job just a bit easier (and explains why so many corporate projects are little more than an excel spreadsheet).

And there are a lot of people who have made custom software just for themselves with AI. Not a product, just a tool or project that finally made sense to build.

pessimizer 27 minutes ago||

But where's the revenue from those? It has to add up to a couple trillion dollars to break even on the capital spending.

pocksuppet 16 minutes ago||

Would you say the same about any other tool, like where is the revenue caused by Susan in accounting having a computer, shouldn't we take away her computer if she can't prove a benefit?

trollbridge 2 minutes ago|||

AI is both overhyped but is also revolutionary at the same time.

I would agree that a lot of companies talking a big talk about using LLMs are failing to actually apply it in a sensible way to their business.

dawnerd 1 hour ago||

Productivity gains seem like it’s at best a wash when you factor in the massive tech debt cleanup and additional time needed to spec and review.

trollbridge 2 minutes ago||

Misuse of AI tools because of continuing a fundamentally broken software development process.

gonzalohm 2 hours ago||

In my opinion, the problem is not even the cost. The problem is that people are using AI for running recurrent stuff instead of writing code to automate it.

For example. Imagine that you are comparing two documents (let's assume diff doesn't exist). You could ask an AI to compare the differences from you or you could use AI to write a tool to do it. For whatever reason, people are starting to go with the former not realizing that now they basically have to pay to compare documents.

bluejay2387 1 hour ago||

I have exposure to AI initiatives at several companies including a few F500's. I have seen teams dump huge logs into frontier models that took hours to get so-so results that we were able to replace with a few lines of python code at 1000 times the speed and 100% accuracy. When asked why they were doing this they literally said "because we don't understand the subject matter so we were depending on the AI". I saw one team file a complaint with a vendor about a frontier backed coding harness and it's inability to consistently format headers because they were using it as a reporting engine. When I recommended they just use the coding tool to write code to generate reports you would have thought I had just cured cancer from their response. I frequently see people complain about the fact that AI is going to take their jobs and then see them gripe about the fact that AI is 'worthless' because it can't do more of their job than it already does. It's easy to see the difference between the people seeing 10x productivity gains from leveraging AI and those who aren't and it's not the AI.

sbarre 50 minutes ago|||

I've heard this framed as "AI raises the floor by 2x or less but raises the ceiling by 10x or more"

irishcoffee 23 minutes ago|||

Someone asked me if I was using models for fantasy sports, and if it was smart enough to help make decisions about drafting.

My answer: no, but it was able to help me find the website and social handles for every beat writer for every team, and generate a simple website where I can do a daily skim of teams/players and draw my own conclusions.

LLMs are a tool, not a panacea.

jerojero 50 minutes ago|||

Because you look at the work from the perspective of a programmer, not the perspective of a regular person.

Normal people have never gone around automating their work. The most automation they do is dynamic tables on excel sheets.

I obviously know building a tool that can programmatically do something is a better solution, but I think that requires a fundamental shift in how people work. People need to be told by someone "this is how you should be using the AI" but right now they're simple told "use the AI".

throwatdem12311 1 hour ago|||

Laziness, pure and simple. The inevitable consequence of “the LLm is the compiler now”. And what do you even expect people to do when they are forced at threat of termination to use AI for everything as much as possible? Not to mention people are being pressured to do insane thing like review hundreds of pull requests per day and deliver like 15 features per week so OBVIOUSLY there isn’t time to build out proper tooling. Just shove everything in a prompt and call it a day. Some people have families to feed, just do what you’re told.

CompoundEyes 1 hour ago|||

Agreed. I’ve been telling my team to build up internal packages so we can push all that ad hoc reinvention into something more tangible and deterministic. Invest the $$$ in inference into something the agent can reach for next time that’s neutral and consumable by other code to reduce future spend.

trollbridge 30 seconds ago||

Yes. Build compact CLI-driven tools, write a skill for it (you can use your agent to do most of this work for you).

It just requires being willing to think instead of mashing prompts into a keyboard.

bilekas 1 hour ago|||

It's this and worse. To use your example, it's like people using AI to write a diff algorithm, incorrectly, then using AI to fix it, because they don't know that diff exists already. Lazyness and starting development with a very low level of understanding. People think lowering the barrier to entry is a good thing, when in reality there are just fundamentals and things you just have to know before you can start using a tool like llms properly.

plmpsu 1 hour ago|||

AI can do things around semantic analysis that a deterministic diff tool cannot.

I understand and agree with your point though.

bilekas 1 hour ago||

I'm curious if you could give me an example of something that couldn't be down deterministically. We have fuzzy search/matching too ? Regex is a monster when used correctly.

SpicyLemonZest 4 minutes ago||

I sometimes find myself with thousands of log lines from a problematic execution and a known good reference, wondering nonspecifically if "something weird" happened in the first one. I don't think there's any matching-based solution there; you need a scan process that understands variations in execution time, object identifiers, etc. aren't meaningful.

m3nu 50 minutes ago|||

100% this. For my own company I mostly build deterministic workflows that may have a simple AI step in the middle using an appropriate Chinese model in a very limited way. I wouldn't want to burn tokens to satisfy some metric.

With this AI is a fallback and not the default. Sounds like large companies have it backwards.

rich_sasha 1 hour ago|||

Isn't that the supposed point of it though? At least how it is marketed/hyped. Don't use your brain, you don't need one, spend all your thinking energy on... dunno, something else, and leave all the "mundane" stuff to AI. Just pay for the tokens, it's going to make you 10x more efficient, the $1000/month is worth it.

avereveard 2 hours ago|||

Same, even opus favor short term solution and scripts with a billion flags that constabtly require rescanning to understand how to launch it is a constant struggle to get it to build sane default and reusable scripts that run with minimal parameters

gonzalohm 22 minutes ago||

Yeah, and what's up with adding dry run to everything? I saw some code that doesn't write anything but still the AI added a dry run which had a completely different codebase

dawnerd 1 hour ago|||

Same with writing boilerplate code. It’s been a solved problem yet here we are.

r_lee 1 hour ago|||

it's all about cost at the end of the day. if you're allowed and encouraged to tokenmaxx, then of course this'll happen.

cyanydeez 1 hour ago|||

Oh no! People are doing what they've been told to do!

jgalt212 1 hour ago||

I agree, but even this use case isn't the most wasteful. The interwebs says Agentic consumes 50% of token use, but I'd hazard this number is north of 90% for many shops. My cynical view of Agentic is its sole purpose is to make "number go up".

id 57 minutes ago||

Look at me! I'm the smartest guy. I've wasted 10M tokens! No one has wasted more!

dgellow 14 minutes ago||

The cost is a problem, but IMHO more important is delegating so much of your internal knowledge, thinking, and systems to a 3rd party.

We are very close to the point where if Claude and ChatGPT APIs are down, companies cannot function. How is that introduced so quickly into so many critical places without taking that specific fact in consideration? What is the plan for all those companies whose workflows now depend heavily on a remote LLM whenever the services get cut? What if your company account gets banned?

In some ways it is worth than depending on a company for hosting, because even your debugging tools are based on AI. MCP is great to go through datadog, sentry, until your agent or the MCP server are down and you don't know how to look for the issue yourself because you do not actually understand how your systems work.

lumost 4 minutes ago||

They are likely also starting to realize that the end result of their anthropic contract is that nobody but anthropic knows how to run their business. Why would anthropic not treat their business like a utility in the future?

cs702 1 hour ago||

There's an old saying, "in the land of the blind, the one-eyed man is king."

Here we have the opposite: In the land of the one-eyed, the blind are leading.

The blind in this case are all those executives and managers who don't understand much about AI's current potential and limitations, and so far have treated it like a magic button that will solve everything. The one-eyed are rank-and-file employees who maybe sort of know a little more about AI.

scronkfinkle 2 hours ago||

On the one hand, organizations are without question using LLM's well beyond what is actually necessary, and as reality kicks in they're forced to scale back accordingly. However at the same time, on intervals counted in months, we're seeing breakthroughs both in hardware and software that dramatically reduce the cost of inference.

Between corporate FOMO and the rapidly decreasing costs of actually running LLM's I'm interested to see at which side of the spectrum these two meet

wg0 58 minutes ago||

The other day we (wrongly) concluded that product market fit has been achieved and now the rivers of hot molten milk chocolate and honey are all that's in the future etc.

1970-01-01 1 hour ago||

Would have been nice to see 'soaring costs' with numbers. WSJ could do better here. Hundreds of thousands of dollars a month is nothing compared to how much they take with better financial models.

Majeh905 38 minutes ago|

Don't have a subscription to wsj.

Only thing I can say AI was useful for, in a corporate environment, was learning a new coding language on the fly. Gives me a baseline to work off of and fix.

But I can learn without it, too. A nice tool, but not a need.

More comments...