Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing

Posted by _____k 4 hours ago

Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing(www.businessinsider.com)

166 points | 227 commentspage 4

whattheheckheck 2 hours ago|

The industry has to tokenmax to juice the revenue numbers. Its a big club

deadbabe 2 hours ago||

Protip: skunkworks type side projects are a great way to do tokenmaxxing when you don’t have enough work coming in, but still need to burn tokens to look productive. And because side projects are only governed by you, you can truly go nuts and let scope creep run wild. Soon enough, you’ll be one of those engineers burning six figures a month on AI and people will be in awe of your abilities, probably even elevating you to key AI evangelist positions within your company. And if you actually create something cool, you’ll be praised for your use of AI, and you can just say you built it all in a day or two instead of slacking off for months on your real work.

phendrenad2 3 hours ago||

AI productivity hasn't been well studied yet, but I'm betting that we'll end up with some variation on Price's Law, I.E. some small subset of workers get most of the benefit, while most just burn tokens with little to show for it.

I also want to call out the false productivity opportunities AI offers. There are whole teams building their own "gas town" and not shipping features.

lorecore 3 hours ago||

Not all tokens are created equal. It's easy to use a ton of tokens by having agents work together in parallel. That's basically the equivalent as people spending time in meetings, hardly a productivity win. As with everything in development, results matter, how you get there doesn't (unless you're a bad manager).

irishcoffee 4 hours ago||

I just realized my company is months behind this curve. About to blow my token allocation. Before I do, anyone have requests? Sincerely.

kibwen 3 hours ago|

I hereby suggest you take the fragmentary excerpts of the infamous erotic stage play The Lusty Argonian Maid shown in The Elder Scrolls series of games and extrapolate them to 100,000 additional full-length acts.

dominotw 2 hours ago||

tangent: anyone have businessinsider subscription. i feel like they've really stepped up their game last few years.

paulpauper 3 hours ago||

many of these leading AI companies are operating at large losses and subsidizing users with VC money. Profitability will entail having to impose greater limits and raising prices, so this will reduce to some degree the value proposition of AI compared to humans.

7777777phil 4 hours ago||

As soon as tokens stop stop being subsidized, heavy agentic use will become as least as expensive than paying an (entry level) employee. When this happens many companies will trade off havy tolen usage for (maybe a bit slower, bit less accurate) employees again.

Wowfunhappy 4 hours ago||

DeepSeek is an open weights model. It's possible the hosted versions are subsidized, but we know what it costs to run locally. And it's expensive, but it's also pretty clearly cheaper than an employee.

Of course, the latest DeepSeek models are not as good as Claude, but they're not super far off either.

amluto 3 hours ago|||

When you use DeepSeek’s first-party API, you are giving them your token stream. This has some training value, but it also has incredible amounts of, well, business intelligence value. When you tell AWS your secrets or your customer data, you can be fairly confident they won’t abuse that knowledge. When you give this data to, say, OpenAI, they more or less promise not to abuse it if you’re on an appropriate business plan. If you give it to DeepSeek, even incidentally as something your agent reads, I would be quite surprised if DeepSeek doesn’t mine it for whatever purpose they or the government feel is appropriate.

The risk of letting your agent read .env goes far beyond the risk that the agent itself does something you don’t like with the contents.

Wowfunhappy 3 hours ago||

But this shouldn't be a risk if you host the model locally.

irishcoffee 3 hours ago|||

They're not far off, getting the same seamless integration as hosted models is a full time job. I think what just happened is that devops is about to explode. What will naturally follow is local hosting of all the things when people realize subscription costs for cloud-whatever are absurd.

Gitlab is going to take off? This is not investment advice.

Wowfunhappy 3 hours ago||

> What will naturally follow is local hosting of all the things when people realize subscription costs for cloud-whatever are absurd.

Even acknowledging we don't know exactly what costs would look like in a world without VC money, wouldn't hosting models logically be cheaper to do at scale in a data center?

When I compared to the cost of running DeepSeek locally, I meant that we can treat that cost as a price ceiling, not the floor.

Groxx 3 hours ago||

Like how server hosting at scale in a datacenter is cheaper than running your own datacenter? Despite ~every company consistently concluding that hosting their own stuff is several multiples cheaper?

No, I think local stuff using also-useful-for-other-things hardware will vastly undercut cloud hosting when the free money pipeline shuts down, and will stay that way for roughly forever. That doesn't mean cloud stuff isn't useful, clearly it is, but adding another company in the middle is rarely the solution for reducing costs.

stult 3 hours ago|||

You're assuming the price won't come down as the tech matures. That seems like a big assumption, considering how quickly open weights models are catching up to frontier models, and how little effort has been invested so far in optimizing inference costs.

It's especially a crazy assumption to make relative to the costs of employing a human. The costs of paying an entry level employee are unlikely to go down at all, and even if those costs do decline, there's a floor they can't drop below (minimum wage at the extreme end), whereas companies are free to optimize agentic costs as close to zero as possible.

So you are assuming that a cost which is extremely susceptible to optimization but which no one has yet seriously attempted to minimize will remain perpetually above a cost which is much less susceptible to optimization, is already subject to enormous efforts to minimize, and has a legally mandated floor. That seems like a bad bet.

skybrian 3 hours ago|||

Maybe this just counts as “light use” since I’m a hobbyist programmer and I only run one coding agent session at a time, but I get about as much done as I did back when I was working while spending a lot of time browsing the Internet, etc.

I’ve spent $10-$20 a day using Claude to write code and closer to $5 a day now that I mostly use Deepseek and GLM, using API pricing (no subscriptions) since I don’t use Claude Code.

This is a rounding error for a company. So I think there’s plenty of room to use AI extensively while being more cost-conscious.

kingstnap 3 hours ago|||

A significant caveat is that there is a pricing mismatch that makes it so first party's can subsidize quite heavily.

Agents are expensive in large part because tool calls require round trips. It's because these APIs are stateless and not streaming so you have to resend the whole context each time. This means you have roughly #tool calls x 1/2 context size cached input tokens over any given session. Most API providers overcharge you by a huge amount for cached tokens. A exception being Deepseek. Paying OpenAI $0.05 for 100k cached GPT5.5 tokens during a possibly 2 second round trip agent tool call is like paying $100/hr for what is likely to be ~10 to 20 GB of VRAM residence (holding the KV cache).

Or it got offloaded to NVME and you are paying $0.05 for that much PCIe bandwidth.

helloplanets 3 hours ago|||

More straightforward to talk about the hardware directly. Full Kimi K2.6 needs an 8x H200 node to run and serve around 20 heavy users. You can rent an 8x H200 node for around $30/hr.

I'd imagine GPT-5.5 and Claude Opus 4.7 could run just fine on a 16x H200 node and serve at least 10 heavy users without the token output getting choppy.

saghm 3 hours ago|||

What's funny is that this apparently wasn't something that the Uber COO seemed to think about when their company is arguably one of the most successful ever at the "subsidize to drive down costs until you capture nearly the entire market" strategy.

cryo32 4 hours ago|||

This is what I’m betting on.

The financials don’t make sense now. Based on the expenditure the finances won’t ever make sense.

fredley 3 hours ago|||

I think if local models catch up with current SOTA then that might not happen. Either way, I'm don't think the long-term for OAI, Anthropic etc. really holds up.

BadBadJellyBean 4 hours ago||

I have been saying the same for while. Someone always says "but Anthropic is making money on their API" or "But it's inference will get cheaper". But I don't believe it. first all the investments have to payed off at some point and second of all there are other things that cost money. I don't believe that any of them have a positive balance sheet.

I also don't think that blitz scaling will work like with Uber. The engineers are still there. We can work without the LLM tools.

solenoid0937 3 hours ago||

If by "investments will pay off" you mean major profits, that's never going to happen as long as scaling laws hold. All revenue will just go to financing more compute, and either we hit AGI or have the greatest economic collapse in modern history.

The world will look drastically different 5 years from now; for the better or worse, so save every penny (especially if you work in tech).

Rohunyyy 4 hours ago||

Now we are going to get a new profession. Token Engineer! They will be experts on tokenmaxxing! The job growth that the billionaire CEOs promised us from AI is finally here!

fsloth 4 hours ago|

Well there are already offerings like githits (https://news.ycombinator.com/item?id=46105112) that sort of promise optimize bang-per-buck of inference

yapyap 3 hours ago|

wtv

More comments...