AI's Affordability Crisis

Posted by ilreb 14 hours ago

AI's Affordability Crisis(blog.dshr.org)

261 points | 348 commentspage 5

sleepybrett 13 hours ago|

It's funny when you watch the doomscroll all these anthropic guys talking about how you should be writing self-improving loops and that's all they do. Of course that's all they do, they don't have to pay for their tokens.

manapause 12 hours ago|

Can confirm, my experience in “loop engineering” was “this is neat” for 45 minutes until a daily ration of tokens was evaporated. The quadratic cost trap is prohibitive to experimentation.

As a localLLM evangelist, I am hopeful this will bring more attention to the joys of rolling your own sovereign AI.

sleepybrett 12 hours ago||

Yeah, i'm hoping that gets smoother. I've been experimenting with omlx and opencode on my m5x64gb and keep running into issues w/ Qwen3.6-35B-A3B-MLX-8bit exceeding it's memory limit at the most inopportune times. Playing with 12B gemma4 (8bit) more today.

Maybe I should be aiming for something targeting 48gb of memory?

manapause 11 hours ago||

It depends what your goals are and what you are using it for. This space is fluid and my answer last week would be different than my answer today! That said there’s no substitute for hard work, here are some resource to get you up to up to speed:

https://carteakey.dev/blog/local-inference/local-llm-optimiz...

https://botmonster.com/ai/self-hosted-ai-agent-frameworks-20...

Personally I find myself swapping models depending if I am engaged in “trad-development” vs building agentic probes or apps involving imagery. Tailscale the LLM to your deployments and ta-da!

HDThoreaun 13 hours ago||

I really can’t stand when writers point to the difference in price per token on the api and subscription and use that as evidence that inference loses money. This author even says it’s implausible that the api charges 4x marginal cost when I think it’s very likely even higher than that. The entire rest of the post sits on this faulty assumption. Fixed costs don’t matter when marginal revenue is profitable and growing rapidly. The ai labs only have 2 questions. Can they prevent users from switching to open source models? Can they scale the number of users on enterprise plans the way they did for coding but in a more general way for all knowledge jobs?

jimbokun 13 hours ago||

Then what are the real costs?

martinald 13 hours ago||

Wrote this a while back. https://martinalderson.com/posts/no-it-doesnt-cost-anthropic...

OpenRouter is the best guide to real costs.

jimbokun 12 hours ago|||

Thanks, that’s exactly what I was looking for!

And much more informative than the speculation and guessing in the article.

anthonypasq 11 hours ago|||

agreed, this doesnt even account for prompt caching or the fact that anthropic has substantial proprietary efficiencies on their inference stack specific to their models and scale.

bcjdjsndon 13 hours ago||

> Can they scale the number of users on enterprise plans the way they did for coding but in a more general way for all knowledge jobs?

Do these knowledge jobs have a significant corpus of not only knowledge but discussion and problem solving, all conveniently labelled for the AI to train on? Probably not. Coding has stack overflow, what does, say, advertising use?

HDThoreaun 13 hours ago|||

I agree this is a hard problem for the labs. I would be hesitant about “probably not” though. There is just as much marketing copy floating around as there is coding training data. I struggle a bit in this question because I’ve only ever worked as a software engineer, so I can’t exactly make claims about all the work other jobs do. But, one example is I was talking to a doctor friend of mine the other day. He was talking about how he had to take his recertification exam recently and put the questions into chatGPT and thought it gave answers that were generally more thoughtful and correct than his own. Does that mean doctors are done? Of course not, but he’s now pushing hard for more ai tool use in his practice.

warkdarrior 13 hours ago|||

> Coding has stack overflow, what does, say, advertising use?

Advertising has centuries of print ads, 100 years of radio advertising, 70 years of TV commercials, etc. And modern AI does not necessarily need labeling.

deweywsu 12 hours ago||

I know a lot of level-headed engineers here may not side with me, but I say let the companies who abandoned their people at the drop of a hat, with CEOs who waved their flag around on social media, proudly declaring how they'd now run their companies with 75% fewer employees wither and die. If I had been let go, there's no way I'd go back to a company like that, and there should be a black list of CEOs who acted this way established and kept public. These CEOs are not holistic thinkers, and are too susceptible to mass hysteria and too irresponsible to real people and their lives to be trusted with the vision for any company ever again.

oreally 12 hours ago||

Someone should keep track of a public database of CEOs who cut workforce while making huge profits. Name, context, situation and all.

deadmutex 11 hours ago|||

Unfortunately, maintaining an opposite list would probably be easier.

CamperBob2 11 hours ago|||

That's basically what you'll see if you open a newspaper to the stock page. That's the idea behind business. It's why you have what you have.

knollimar 11 hours ago||

Depends on if it's shortsighted by giving up your intellectual capital for short term profit.

SamuelAdams 12 hours ago|||

GM just did this in the last 30 days [1], and their sales are likely going to be just fine. In fact the auto industry has repeatedly automated jobs over the last 100 years, and they still make decent sales numbers.

If you decided to boycott every company that replaced staff with automation, you would be forced to exit the economy. Every company does this to some degree and the customers who vote with their wallet do not seem to care about a reduction in force.

[1]: https://arstechnica.com/ai/2026/06/gm-installs-robots-at-fla...

MadxX79 12 hours ago|||

Robots that replace auto industry factory workers exist; the CEO of GM didn't imagine them as part of some sort of business media induced psychotic episode.

The same is not true for the software industry execs.

pragmatic 12 hours ago||||

GM is running 0% interest, no payments until n deals right now.

That’s usually a sign that sales are not “just fine”.

coryrc 12 hours ago||

They always are.

smahs 12 hours ago||||

The above comment, to which you responded, wrote about CEOs who responded to mass hysteria, not those who automated anything.

deweywsu 12 hours ago||||

This is true, and I'm sure AI cuts will continue, but it's obvious that the ones who went "all in" at AI's mass introduction were drinking a special kind of Kool-Aid reserved for the truly sycophantic Wall Street lap dogs, not the CEOs who think about risk and are cautious about betting the farm on a relatively new and mostly untested technology. GM is over 100 years old, and no doubt released improvements that were well-tested and predictable, because you don't take massive chances with a company that well established. It was a couple years into the mass AI deployment that studies on the minimal overall productivity gains of AI even started to come out(!) This was "get on the bandwagon" thinking at a massive scale, which shows you how many CEOs are not independent thinkers at all, but are really just followers. Yes, use AI, but do it responsibly, never forgetting that your investors aren't your only stakeholders - so are your people.

johnvanommen 12 hours ago|||

> GM just did this in the last 30 days [1], and their sales are likely going to be just fine. In fact the auto industry has repeatedly automated jobs over the last 100 years, and they still make decent sales numbers.

I worked at Verizon during their layoffs last year. Biggest layoffs in the USA.

As someone who’s been laid off before, I knew that it generally boosts the stock price.

I bought VZ because of that. It’s up 15% since the layoffs.

Microsoft, an AI stock, is down 30% in the same timeframe.

caconym_ 11 hours ago||

I'll believe it when I see it, but I would love to see it.

trollbridge 13 hours ago||

The article fails to mention DeepSeek, Alibaba, Qwen, Xiaomi, MiMo, z.ai, or GLM. It's hard to take such an article seriously that doesn't do this. (Our monthly total spend is around $180 with a team of 6, about half technical; our biggest line items are for American models or subscriptions which we probably will be planning to get rid of.)

And then remarks like this:

  Anthropic, OpenAI and Microsoft have all now transitioned customers from subscriptions to token-based pricing.

Huh? I use OpenAI via a subscription, as is anyone else using GPT-5.5-Pro who isn't a multimillionaire.

jwolfe 13 hours ago||

They're referring to Enterprise customers, though should have been clear about it. Enterprise plans on Claude for example no longer include any baseline tokens. It's 100% usage based pricing.

trollbridge 11 hours ago||

True, but my friends in Enterprise still just purchase Claude Code subs and expense them. They basically get an allowance of $500 or so per month to buy various tools, and of course are banned from Chinese models. (Claude, Codex, Antigravity allowed, basically.)

junior44660 13 hours ago|||

> Our monthly total spend is around $180 with a team of 6, about half technical; our biggest line items are for American models or subscriptions which we probably will be planning to get rid of.)

Please tell more :). Do you pay per token from bedrock / openrouter / somewhere else? How many tokens you use over the month, and how many for each task? Which harnesses?

trollbridge 11 hours ago|||

Pay for DeepSeek directly. One developer insists on having his own account and in theory expenses it, but he forgets to turn in $10 expense reports. (Total spend in last two months = about $45.)

Pay for OpenAI Pro directly, but I’m the only guy that uses Codex. $100 a month. My nontechnical partner likes to talk to ChatGPT 5.5 Pro for image related tasks (think generating interior decorating pics).

The nontechnical staff use a Gemini account on a Google family AI Pro sub. I use Antigravity when working on Android or Google Cloud API codebases.

Everyone gets OpenCode Go. The cost is trivial. $10 a month per person.

Pay for MiMo directly. We use it during Chinese off peak hours though. Total spend so far $25 in last month.

We run a few Qwen models locally and pretty much have them pegged all day. RTX 5090 on a PC and a Mac Studio.

There’s also Grok which is used for Imagine for artistic / graphic design related work. I also use the subscription for a vision model in my oh-my-pi harness.

We’re having discussions about how to pull in GLM-5.2 cost effectively. We compete with third world development shops so we can’t really pass on inference costs, but we can benefit from getting jobs done for customers faster. But ⅔ of our work is either internal or open source projects we can’t bill for.

ignoramous 2 hours ago||

> Pay for MiMo directly. We use it during Chinese off peak hours though. Total spend so far $25 in last month.

Team size, if you don't mind?

> We're having discussions about how to pull in GLM-5.2 cost effectively

Are you evaluating Alibaba's token plan ($50/mo) which includes Qwen3, MiniMax M2, Kimi K2, and GLM5 series.

trollbridge 55 minutes ago||

6 people, 3 programmers, 3 non programmers who now use AI as much as anyone else.

I have not yet checked out Alibaba’s plans. We’re still just using OpenCode Go for GLM-5.2 and Qwen-3.7-Max.

I haven’t looked into MiniMax M3 much at all due to cost.

stavros 13 hours ago|||

Not the GP, but I use Opus for planning, Deepseek for actual coding (implementing the plan) and GPT for review. GPT is inexhaustible on the $20/mo plan, Deepseek is dirt cheap (maybe $10/mo) and Claude is Claude.

junior44660 13 hours ago||

GP is talking about API / token-based prices, that's why I asked.

stavros 13 hours ago||

I don't know, he said "subscriptions" in the line items, but eg I use Deepseek via the API.

junior44660 13 hours ago||

Ah maybe you're right.

I can manage this budget with the chinese models in AWS BedRock. However, in my experience, they aren't as good as claude today.

cdata 13 hours ago||

I think the author is referring to enterprise customers. You aren't the "customer" in this case; you're the bait.

How do you know that the other models you are referring to aren't subsidized?

skeledrew 12 hours ago||

Subsidizing makes no sense when there's no - possibility of a - moat. Although it's very possible that China in general subsidizes Chinese labs in some way so they maintain pressure on US labs. But you only have to look at proxies such as OpenRouter to see that the individuals aren't doing any subsidizing on per token costs.

SirFatty 13 hours ago||

"Crisis"

NitpickLawyer 13 hours ago||

The Token Tension :)

holyknight 12 hours ago||

Most of the "affordability" and "pricing" discussion is pointless because we don't have any real numbers on their margins per token. So, yes, they are subsidizing their subscription plans compared to the API prices, but the API prices could already be stupidly inflated, so the relative price comparison is a nothing burger. Until we know (or at least get a hint) on their margins on API prices, any pricing discussion is pointless.

kingstnap 12 hours ago|

I don't understand this line of reasoning at all.

We have a pretty good idea of how much it costs to serve these models. You can pencil out the economics and guess at the model sizes and we know pretty decently how expensive the hardware is.

This like claiming it's meaningless to guess the margins of a restaurant without going into their books and seeing the exact recipets and recipes.

They ain't doing dark arts in the back. You can guess at what goes into the food based on similar recipies and how much that costs based on what you pay at the grocery store.

1vuio0pswjnm7 6 hours ago||

Here are all the references in this blog post

https://sequoiacap.com/article/follow-the-gpus-perspective/

https://sequoiacap.com/article/ais-600b-question/

https://www.wheresyoured.at/brokenomics/

https://www.wheresyoured.at/exclusive-openai-financials/

https://www.wheresyoured.at/news-microsoft-to-shift-github-c...

https://archive.is/m5MHe#selection-1483.0-1483.74

https://www.youtube.com/watch?v=MNQDrF0HjtI

https://www.youtube.com/watch?v=VBHSjzHW-C8

https://www.derekthompson.org/p/the-great-ai-cost-panic-of-2...

https://www.tomshardware.com/tech-industry/artificial-intell...