As a localLLM evangelist, I am hopeful this will bring more attention to the joys of rolling your own sovereign AI.
Maybe I should be aiming for something targeting 48gb of memory?
https://carteakey.dev/blog/local-inference/local-llm-optimiz...
https://botmonster.com/ai/self-hosted-ai-agent-frameworks-20...
Personally I find myself swapping models depending if I am engaged in “trad-development” vs building agentic probes or apps involving imagery. Tailscale the LLM to your deployments and ta-da!
OpenRouter is the best guide to real costs.
And much more informative than the speculation and guessing in the article.
Do these knowledge jobs have a significant corpus of not only knowledge but discussion and problem solving, all conveniently labelled for the AI to train on? Probably not. Coding has stack overflow, what does, say, advertising use?
Advertising has centuries of print ads, 100 years of radio advertising, 70 years of TV commercials, etc. And modern AI does not necessarily need labeling.
If you decided to boycott every company that replaced staff with automation, you would be forced to exit the economy. Every company does this to some degree and the customers who vote with their wallet do not seem to care about a reduction in force.
[1]: https://arstechnica.com/ai/2026/06/gm-installs-robots-at-fla...
The same is not true for the software industry execs.
That’s usually a sign that sales are not “just fine”.
I worked at Verizon during their layoffs last year. Biggest layoffs in the USA.
As someone who’s been laid off before, I knew that it generally boosts the stock price.
I bought VZ because of that. It’s up 15% since the layoffs.
Microsoft, an AI stock, is down 30% in the same timeframe.
And then remarks like this:
Anthropic, OpenAI and Microsoft have all now transitioned customers from subscriptions to token-based pricing.
Huh? I use OpenAI via a subscription, as is anyone else using GPT-5.5-Pro who isn't a multimillionaire.Please tell more :). Do you pay per token from bedrock / openrouter / somewhere else? How many tokens you use over the month, and how many for each task? Which harnesses?
Pay for OpenAI Pro directly, but I’m the only guy that uses Codex. $100 a month. My nontechnical partner likes to talk to ChatGPT 5.5 Pro for image related tasks (think generating interior decorating pics).
The nontechnical staff use a Gemini account on a Google family AI Pro sub. I use Antigravity when working on Android or Google Cloud API codebases.
Everyone gets OpenCode Go. The cost is trivial. $10 a month per person.
Pay for MiMo directly. We use it during Chinese off peak hours though. Total spend so far $25 in last month.
We run a few Qwen models locally and pretty much have them pegged all day. RTX 5090 on a PC and a Mac Studio.
There’s also Grok which is used for Imagine for artistic / graphic design related work. I also use the subscription for a vision model in my oh-my-pi harness.
We’re having discussions about how to pull in GLM-5.2 cost effectively. We compete with third world development shops so we can’t really pass on inference costs, but we can benefit from getting jobs done for customers faster. But ⅔ of our work is either internal or open source projects we can’t bill for.
Team size, if you don't mind?
> We're having discussions about how to pull in GLM-5.2 cost effectively
Are you evaluating Alibaba's token plan ($50/mo) which includes Qwen3, MiniMax M2, Kimi K2, and GLM5 series.
I have not yet checked out Alibaba’s plans. We’re still just using OpenCode Go for GLM-5.2 and Qwen-3.7-Max.
I haven’t looked into MiniMax M3 much at all due to cost.
I can manage this budget with the chinese models in AWS BedRock. However, in my experience, they aren't as good as claude today.
How do you know that the other models you are referring to aren't subsidized?
We have a pretty good idea of how much it costs to serve these models. You can pencil out the economics and guess at the model sizes and we know pretty decently how expensive the hardware is.
This like claiming it's meaningless to guess the margins of a restaurant without going into their books and seeing the exact recipets and recipes.
They ain't doing dark arts in the back. You can guess at what goes into the food based on similar recipies and how much that costs based on what you pay at the grocery store.
https://sequoiacap.com/article/follow-the-gpus-perspective/
https://sequoiacap.com/article/ais-600b-question/
https://www.wheresyoured.at/brokenomics/
https://www.wheresyoured.at/exclusive-openai-financials/
https://www.wheresyoured.at/news-microsoft-to-shift-github-c...
https://archive.is/m5MHe#selection-1483.0-1483.74
https://www.youtube.com/watch?v=MNQDrF0HjtI
https://www.youtube.com/watch?v=VBHSjzHW-C8
https://www.derekthompson.org/p/the-great-ai-cost-panic-of-2...
https://www.tomshardware.com/tech-industry/artificial-intell...
https://www.tomshardware.com/tech-industry/artificial-intell...
https://blog.dshr.org/2025/10/depreciation.html
https://x.com/ThierryBorgeat/status/2060069195975422281
https://wlockett.medium.com/the-ai-industry-is-panicking-db5...
https://www.sofi.com/learn/content/average-salary-in-us/
https://www.theglobalstatistics.com/united-states-labor-stat...
https://www.bls.gov/news.release/pdf/ecec.pdf
https://www.businessinsider.com/ai-bubble-heads-doomers-sam-...
https://www.wsj.com/tech/ai/openai-considers-drastic-price-c...
https://www.bloomberg.com/opinion/articles/2026-06-11/anthro...
https://arstechnica.com/ai/2026/06/anthropic-pauses-token-ba...
https://x.com/bcherny/status/2040206441756471399?lang=en
https://code.claude.com/docs/en/agent-sdk/overview
https://windowsforum.com/threads/microsoft-plans-june-30-202...
https://www.datacenterdynamics.com/en/news/anthropic-to-use-...
https://techcrunch.com/2026/06/05/google-will-pay-spacex-920...
https://backofmind.substack.com/p/tokenalysis-and-john-henry
HN commenters quickly attack anything from Ed Zitron these days
But this seems to be flying under the radar