Top
Best
New

Posted by palashawas 18 hours ago

Computer Use is 45x more expensive than structured APIs(reflex.dev)
402 points | 230 comments
RadiozRadioz 37 seconds ago|
> The alternative, writing an MCP or REST surface per app, is its own engineering project

Well, if your backend was sufficiently decoupled from your frontend, and the server-side operations were designed thoughtfully and generically, it need not be an engineering project.

angry_octet 14 hours ago||
Great guidance hidden in here for making it expensive for agents to navigate your website. Move elements on screen as the mouse moves, force natural mouse movement to make the UI work, change the button labels in the JS to be randomly named every visit, force scrolling to the bottom of the screen to check for hidden extra tasks...

Hang on, that sounds like common corporate SaaS apps.

zmmmmm 10 hours ago||
It's really weird, I'm seeing across the board that people who never believed in them before are suddenly all into good software eng practices (starting with writing a spec) because of AI.

It's kind of fascinating that we never were willing to do these things for humans but now that AI needs it ... we are all in. A bit depressing in the sense that I think mostly the reason we happy to do it for AI is that we perceive it will benefit us personally rather than some abstract future human.

majormajor 8 hours ago|||
> It's really weird, I'm seeing across the board that people who never believed in them before are suddenly all into good software eng practices (starting with writing a spec) because of AI.

> It's kind of fascinating that we never were willing to do these things for humans but now that AI needs it ... we are all in. A bit depressing in the sense that I think mostly the reason we happy to do it for AI is that we perceive it will benefit us personally rather than some abstract future human.

I don't think that's the reason.

I think it's because they take time, and few people were willing to put in time for "maybe it'll make writing the actual code faster" gains when the code was going to take a few times longer to write itself.

You also can get faster feedback to iterate on your spec now, which improves the probability of it helping future-you.

So combine that with the fact that the llms are more likely to get lost if you don't spec stuff in advance, and the value of up-front work is higher (whereas a human is more likely to land on the right track, just more slowly than otherwise, making the value harder to quantify).

Cthulhu_ 2 hours ago||
Yeah I think a lot of pushback to best practices is basic cost/benefit; I like writing documentation, but I'm also often feeling a bit depressed that nobody will actually read it in as much detail as I wrote it. But LLMs do / can.

Actually there's a lot of projection there too; I don't read documentation in detail. And nowadays, I point an LLM at documentation so that it can find the details I would otherwise skip over.

The destruction of the millennial attention span is real, and it's worse in the younger generations, lmao.

noduerme 2 hours ago||
Well it's also just that you have a list of 20 features to add, and if it works, you want to ship it, and someone might even get mad if you spend a day dawdling on best practices and documentation and so on. Corporate cultures generally don't have the same long term thinking about reusability and legibility and fault-tolerance that an individual coder may have about the code they want to write once and forget. (Neither do LLMs, for that matter).
bloppe 9 hours ago||||
My friend at a faang was talking about the "massive overhauls to make everything ready for ai". I asked for an example. He said "basically just documenting the shit out of everything"

I guess that just never occurred to anybody before.

_heimdall 8 hours ago|||
The CEO of Uber made the same comment on Diary of a CEO recently. I think it was for their customer service team if I'm not mistaken, they threw their existing docs at an LLM and it was all over the place because policies were poorly documented and defined. The team is now documenting everything from scratch, focusing on outcomes rather than process - TBD if it works out.
noduerme 2 hours ago||
Yeah, someone made the point in a popular post here recently that all the firings are reducing institutional knowledge. IMHO, replacing that knowledge with LLM-written documentation is even more potentially catastrophic. Just from organizations I've worked in, a lot of the useful human knowledge is in knowing how to handle either undocumented edge cases or situations where the documents are outdated or wrong. Working with LLMs and reminding them to update those docs every time? Good luck. And if it's something where the docs touch actual real world operations, that's an area where only human operators with hands-on experience are going to recognize the potential conflicts or cognitive dissonance.
majormajor 8 hours ago||||
Having the humans document the code seems backward (maybe that's not what they're doing, but "make everything ready for ai" sound manual). And hopefully there aren't that many scary surprises that humans need to manually document.

One of the best parts of LLMs is that you can use them to bootstrap your documentation, or scan for outdated things, etc, far more quickly than ever before.

Don't just throw a mountain at it and ask it to get it right, but use a targeted process to identify inconsistencies, duplicates, etc, and then resolve those.

And then you have better onboarding material for the next human OR llm...

palmotea 5 hours ago||
> Having the humans document the code seems backward (maybe that's not what they're doing, but "make everything ready for ai" sound manual).

No, that's forward. Any documentation an AI can make, another AI can regenerate. If an LLM didn't write the code, it shouldn't document it either. You don't want to bake in slop to throw off the next LLM (or person).

programmarchy 8 hours ago||||
AI might actually RTFM
Cthulhu_ 2 hours ago||
It would / should / can, but there's big efforts in reducing token consumption now, so AI will likely try to skim and pick documentation just like real humans.
akoboldfrying 8 hours ago|||
There was a recent effort at work to make it possible for agents to provide up-to-date help on how to do various admin/setup tasks. A very sensible goal: We already have lots of documentation, the problem is that it's scattered everywhere and mostly out of date. Turns out the new solution amounted to someone manually going through it all and painstakingly preparing some Markdown files for consumption by said agent.

Somebody pointed out that those Markdown files might be helpful for people to read directly. Bit of an Emperor's new clothes moment. (I wanted to slap a : rolling_on_the_floor_laughing: reaction on it, but sadly it turns out I'm actually too chickenshit to do that in today's job market.)

GarnetFloride 9 hours ago||||
My manager just told me that after 12 years of trying to get one of the founders to understand the difference between dev docs and user docs, they tried getting Claude to do it and he finally got it that they are different. He'd been saying this whole time that customer could just read the dev docs. If they could they wouldn't need our software.
zenoprax 7 hours ago||
How firm is the boundary between a dev doc and a user doc in your opinion? I have found that the overlap can be quite large if the users are also technically proficient. Right now I'm trying to balance "how X works so you can use the app better" with "how X works so you can contribute or build your own plugin". DeepWiki really helps as a backstop for anything not already covered though it's not without its own caveats of course.
PetitPrince 3 hours ago||
Not OP but I think you have the right intuition in making a difference between using the app / contribute to the app. You may want to read https://diataxis.fr/ which elaborate on this idea and add another dimension (action / cognition) to this.
zenoprax 14 minutes ago||
I appreciate the suggestion but that's what I've been using! :D

In fact, the only area I've been struggling with are "Concepts" because they have less clear boundaries for the right amount of detail.

Here is what I've been working on: https://github.com/super-productivity/super-productivity/wik...

DrewADesign 8 hours ago||||
I always knew the dev world leaned more toward interesting technical challenges and interoperability than maximizing the benefit to humanity- it’s why I switched to design. However, I didn’t realize the intensity of that preference until the entire industry got ridiculously AI-pilled.
taneq 9 hours ago||||
It’s an interesting psychological phenomenon. It’s like the way I keep my house way tidier since I got a robot vacuum. Pick things up off the floor for aesthetics’ sake? Nah. Pick them up because the vacuum will attempt to eat them and might get sick? Of course!
cheriot 7 hours ago|||
Better commit messages, better and more up to date docs, etc. It's not all slop!
notjustanymike 11 hours ago|||
Ah damn it, we invented Jira
fooker 10 hours ago|||
Jira from first principles

Almost sounds like an Orielly book

QuantumNomad_ 9 hours ago||
The O’Reilly animal for Jira is apparently some kind of duck or goose.

Matthew B. Doar (2011). Practical JIRA Plugins. O’Reilly.

https://www.oreilly.com/library/view/practical-jira-plugins/...

In case anyone was wondering. Which they probably weren’t :p

fooker 9 hours ago||
I'm more interested in the next volume: impractical Jira plugins
drob518 10 hours ago|||
Real LOL!
MereInterest 9 hours ago|||
The trick is that you make it something that humans want to do. Using [0] as an example, the interactive elements move, with context-dependent environment interactions.

[0] https://www.cs.unm.edu/~dlchao/papers/p152-chao.pdf

tdeck 5 hours ago|||
So ASP WebForms was the technology we needed all along?
cco 8 hours ago|||
You can have both! haha

We built isagent.dev for exactly this reason, serve human content to humans, serve agent optimized content to agents.

jasomill 6 hours ago||
I had one project where a desktop application deliberately hid the contents of all grid controls from Windows accessibility APIs, took measures to ensure checkbox and radio button selections made through accessibility APIs did not register, and all functions that allowed data to be exported were protected by CAPTCHAs.

Generative AI wasn't a thing at the time, but I had to resort to a combination of OCR, simulated user input, and print capture to drive the application and export data.

Had the developers been aware of the Windows DRM APIs that block screen capture, or the fact that text is easily recoverable from PostScript files with minimal formatting, I don't know what I would have done.

The irony is that the process this replaced involved giving cheap offshore labor full read-only remote access to all data in the system, which was by any measure a far more serious security risk than otherwise authorized employees using tools running locally with no network access provided by established, trustworthy vendors to automate their work.

merlindru 16 hours ago||
I'm building something that fixes this exact problem[1].

The landing page doesn't advertise it yet, but essentially, I give agents a small set of tools to explore apps' surfaces, and then an API over common macOS functions, especially those related to accessibility.

The agent explores the app, then writes a repeatable workflow for it. Then it can run that workflow through CLI: `invoke chrome pinTab`

Why accessibility? Well, turns out that it's just a good DOM in general. It's structure for apps. Not all apps implement it perfectly, but enough do to make it wildly useful.

[1] https://getinvoke.com - note that the landing page is targeted towards creatives right now and doesn't talk about this use case yet

ctoth 16 hours ago||
If agents is what it finally takes to get good a11y I'll take it. I'll bitch about it, but I'll take it.
tomjakubowski 14 hours ago|||
Playwright, the end-to-end testing framework for the web, provides a strong incentive to give sites good a11y: Playwright tests are an absolute delight to read, write and maintain on properly accessible sites, when using the accessibility locators. Somewhat less so when using a soup of CSS selector and getByText()-style locators.

One thing I am curious about is a hybrid approach where LLMs work in conjunction with vision models (and probes which can query/manipulate the DOM) to generate Playwright code which wraps browser access to the site in a local, programmable API. Then you'd have agents use that API to access the site rather than going through the vision agents for everything.

giancarlostoro 11 hours ago|||
This is precisely how the Playwright MCP works, which lets something like Claude directly test a website.

https://playwright.dev/docs/getting-started-mcp#accessibilit...

I've mentioned several times and gotten snarky remarks about how rewriting your code so it fits in your head, and in the LLM's context helps the LLM code better, to which people complain about rewriting code just for an LLM, not realizing that the suggestion is to follow better coding principles to let the LLM code better, which has the net benefit of letting humans code better! Well looks like, if you support accessibility in your web apps correctly, Playwright MCP will work correctly for you.

Amazing.

tyingq 11 hours ago||||
Was looking for this comment. I'd like to see this approach in the comparison...having the LLM build a playwright script and use it. I suspect it would beat time-to-market for the api, and be close-ish in elapsed time per transaction.

Harder to scale if it's doing a lot of them, I suppose.

lsaferite 11 hours ago|||
Using playwright-cli with Claude code is highly effective for debugging locally deployed web apps with essentially zero setup.
pjc50 13 hours ago||||
Very real risk of this going in reverse: people building inaccessible websites to prevent AI use.
sciencejerk 6 hours ago|||
Or human engineers limiting AI-consumable documentation to improve job security!
solenoid0937 12 hours ago||||
Those people probably aren't working on anything useful anyways, so its no big deal.
20k 11 hours ago||
I've found that by far the most useful websites as a programmer are also the ones most resistant to AI. This would be a huge loss for anyone vision impaired
claytonjy 11 hours ago|||
What sorts of sites are you thinking of? To me, “most useful to a programmer” evokes docs and blogs and github issues and forum posts. I suppose some forums might be AI-resistant (login wall), but the others are trivially AI accessible.
Rebelgecko 9 hours ago|||
Plenty of Linux-y websites use Anubis. Arch Wiki and IIRC some other distros too.
fc417fc802 8 hours ago||
That's less a value judgment, more a necessary evil due to the plethora of bad actors out there. I doubt it will get in the way of a local model used in a reasonable manner.

Most wikis you can mirror locally if you really need to hammer them.

irishcoffee 11 hours ago|||
GitHub is naturally LLM resistant via its new uptime feature… I’ll show myself out.
stingraycharles 11 hours ago|||
Examples, please.
stingraycharles 11 hours ago||||
That’s such an extremely small niche of people it’s not a real risk.
blurbleblurble 13 hours ago|||
"AI" is a made up hype thing. It's just computers and computer programs. For real!
merlindru 15 hours ago||||
i think this goes both ways too :) agents have been a boon for everyone with disabilities, carpal tunnel, RSI, ADHD, anything

and now the fact that interfaces need to be accessible to agents, not just humans, ironically increases it for humans in return

lopis 14 hours ago||
And lets not forget that not all disabilities are chronic. Many disabilities are situational or temporary. AI is a great assist for a hangover day for example...
linkjuice4all 13 hours ago|||
I mean…I guess. But this is ridiculous - how many layers does our technology need to bash through to update two records on remote systems? I get that value is being added at some point - but just charge some micropayment for transactions. This is just too much.
lazide 13 hours ago||
Ever read Vernor Vinge’s a deepness in the sky? Digital archeologist, coming right up.
btown 11 hours ago|||
If you're on macOS and interested in this space, I highly recommend you open up the system-provided Accessibility Inspector.app and play around with apps and browsers. See how the green cells might guide an LLM to only need to read/OCR specific parts of a screen, how much text is already natively available to the accessibility engine, and how this could lead to really effective hybrid systems - not just MCPs, but code generators that can build and run their own scripts to crawl your accessibility hierarchy for your workflow!

I think this is very fertile ground - big labs need to use approaches that can work on multiple platforms and arbitrary workflows, and full-page vision is the lowest common denominator. Platform-specific approaches are a really exciting open space!

jasomill 4 hours ago|||
Windows has similar APIs and tools, see, e.g.,

https://accessibilityinsights.io/

https://learn.microsoft.com/en-us/windows/win32/winauto/insp...

https://github.com/FlaUI/FlaUInspect

and for WPF applications specifically,

https://github.com/snoopwpf/snoopwpf

merlindru 10 hours ago||||
That's how I got into this thing in the first place, hah. Golden advice. It's incredibly cool to see what some apps offer. More of them have great accessibility support than you think (or at least than I thought!)
willwade 4 hours ago||||
take a peek at https://github.com/willwade/app-automate?tab=readme-ov-file#... - its early and needs some work -but this is the idea behind this.. (my use case is not agents but actual real disabled people..who need tooling to provide better access to the desktop)
drob518 10 hours ago|||
Great idea.
gbriel 16 hours ago|||
This is a good solution, instead of everyone blowing tokens on repeating the same computer use task, come up with a way to share the workflows. I think you'd need to make sure there aren't workflows shared that extract user information (passwords).
merlindru 16 hours ago||
this is protected against at the OS level, provided the applications declare the input correctly as a SecureTextField.

i so far haven't found any application that doesn't.

all you're able to get out, as far as i can tell, is the length of the entered password.

jasomill 4 hours ago||
From applications that capture the screen or use accessibility APIs, perhaps, but what about, e.g., Windows applications that capture window messages, e.g.,

https://devblogs.microsoft.com/cppblog/spy-internals/

Obviously, if you can inject code into a process that receives sensitive data, you're already running in a context where all security bets are off.

But with processes you yourself create, you probably can, even without elevated privileges, unless the application takes measures to prevent injection (akin to game anticheat mechanisms), so it seems worth pointing out that there are simple mechanisms to subvert such "protected" fields that don't require application-specific reverse engineering.

willwade 5 hours ago|||
Interesting! I started something - nowhere near as complete as that and quite different but again using accessibility UI elements. The BIG problem I've found is SOOOO much stuff does really poorly having these elements exposed. Here was my approach https://github.com/willwade/app-automate?tab=readme-ov-file#... - What I do here is build UI templates - either using UIAccess OR using a one pass using a vision model.

Now the argument against this on [reddit](https://www.reddit.com/r/openclaw/comments/1s1dzxq/comment/o...)

"my experience is the opposite actually. UIA looks uniform on paper but WPF, WinForms, and Win32 all expose different control patterns and you end up writing per-toolkit handlers anyway. Qt only exposes anything if QAccessible was compiled in and the accessibility plugin is loaded at runtime, which on shipped binaries is basically never. Electron is just as opaque on Windows as on macOS because it's the same chromium underneath drawing into a canvas. the real split isn't OS vs OS, it's native toolkit vs everything else."

teej 16 hours ago|||
You should call it Braille
merlindru 16 hours ago||
shit, why didn't i think of that

i tend to think of invoke as "an API over macOS apps" tho...

doesn't `invoke finder shareAndCopyLink` read very nicely? :P

hellojimbo 14 hours ago|||
Isn't that basically what browser base does. I've found the hardest part of browser use to be stealth first then client change management then browser comprehension (which gets better with every new model).
merlindru 14 hours ago||
i'm not too familiar with browserbase, but invoke works with any macOS app (or at least the accessible ones), i think browserbase is only for browser usage.

in the context of this blog post, the conclusion looks similar though!

"use the whole web like it's an API"

works much better than

"figure out similar or identical tasks from a clean slate every single time you do them"

izend 13 hours ago||
Does https://github.com/webmachinelearning/webmcp overlap ?
merlindru 13 hours ago||
Not really IMO, webmcp has devs change their apps. invoke just works with existing apps, especially ones that are accessible

invoke rather has overlap with Claude's and Codex' computer-use, except the steps are stored/scripted.

webmcp is bottom-up. computer-use & invoke are top-down

theptip 10 hours ago||
I’m missing the premise. For internal apps why would you ever reach for Computer Use vs just having your agent whip up a cli or MCP?

_of course_ computer use is worse. It is your last resort. Do not use it on state that lives in a DB that you own.

If anything I am impressed that it’s only 50x worse.

jacktu 16 hours ago||
Totally agree. I’ve been building an AI visual tool recently and experimented with both approaches. The latency and c ost of generic "agentic" browser use are absolute dealbreakers for real-time consumer apps right now. Structured APIs (even just chained LLM calls with strict JSON schemas) are not only 40x cheaper, but more importantly, they are deterministic enough to actually build a stable product on top of. Computer use is an amazing demo, but structured APIs are what pay the server bills.
ai_fry_ur_brain 16 hours ago|
"Agentic engineering" were always just FADs to bring in more revenue for token providers.

If I think an LLM is good for something I create well defined, very deterministic "middleware" for that purpose on top of Openrouter.

k__ 15 hours ago|||
Agentic engineers can build well defined, very deterministic middleware on top of OpenRouter.

Anthropic even says, that an agent based solution should only be your last resort and that most problems are well served with a one-shot.

https://www.anthropic.com/engineering/building-effective-age...

ai_fry_ur_brain 15 hours ago||
Written 1.5 years ago. Anthropic would not advertise this stance today.

I'm much more agreeable with that type of LLM workflow. Running "agents" with monolithic "harness" for long time horizon tasks seems wasteful, unecessary but probably super appealing to lazy people.

wahnfrieden 15 hours ago|||
It’s not a fad or without value.
ai_fry_ur_brain 15 hours ago||
Its very much valuable to lazy people who dont care about quality or doing hard things. I totally see the appeal for those people.
wahnfrieden 14 hours ago||
Sounds like you are more interested in performativity / aesthetics of production if you think writing software in a harder way is an indisputable virtue just because it requires more effort. On top of that you are an elitist about it

Agent use can be used to improve quality and maintainability

Worf 15 hours ago||
Is it possible to ask the vision agent to "map" the UI and expose it to another agent as a set of interfaces that resemble an API better? From what I understand the vision agent now should both know that "next page" shows more results and that they need to get more results in the first place.

If one agent just explores the UI, maybe in a test environment, and outputs a somewhat-structured description of the various UI elements and their behavior, then another agent was given that description, would the other agent perform better that an agent that both explores the UI and tries to accomplish the given task at the same time?

With an example UI I made up, the description (API-like interface definition) could be something like:

  Get all reviews:

  To get all the reviews you need to go to each page and click "show full review" for every review summary in that page.

  Go to each page:

  Start at page 1 (the default when in the Reviews tab). Continue by clicking the "next" button until the "next" button is no longer available (as you've reached the last page).
So the second agent can skip some thinking about how to navigate because it already has that skill. The first agent can explore the UI on its own, once, without worrying about messing up if there's a test environment.

Or am I misunderstanding the article completely? Probably. But it's interesting nonetheless. Sorry if it makes no sense.

nijave 10 hours ago||
That was my first thought as well. A lot of current web development relies heavily on code generation then has obsfuscation and compression slapped on top leading to complicated structures. Then on top of that, more code (client side/JavaScript) reconfigures everything again. You end up with fairly complicated html/css/JavaScript to wade through.

For better and worse, 5-10Mi isn't uncommon for a web app.

Instead of trying to go "bottom up" and, effectively, do what a browser engine is doing in reverse, it seems easier to go "top down" like a human does and go off the visual representation.

angry_octet 14 hours ago|||
I think you're right, you can get agents to do what we do -- learn how a website works. Then expose that model as a simple API. There will still be some vision tasks for navigation but they will be just vision tasks, no thinking required.
faangguyindia 10 hours ago||
>Is it possible to ask the vision agent to "map"

No most vision models focus on subset of an image at a time when using image -> text

image -> image uses whole image.

esperent 6 minutes ago||
[delayed]
oleg2025 3 hours ago||
Couple of months ago I was inspired by kubectl, and built desktopctl CLI to control GUI apps. It uses combination of OCR and Accessibility API on Mac, represents UI as markdown, and exposes actions for mouse and keyboard.

My core idea was that "fast" perception loop is fully local, GPU optimised for UI tokenisation and change detection. "Slow" control loop requires LLM roundtrip, and uses token-efficient markdown interface in CLI output.

It uses relatively stable identifiers for controls, so agents can script common actions, eg `desktopctl pointer click --id btn_save` doesn't require UI tokenisation loop.

https://github.com/yaroshevych/desktopctl/tree/main

oleg2025 2 hours ago|
I've learned that compared to APIs, human interfaces are slow and messy, but there is actually a lot of science behind them. The good apps expose information well, and are optimised for clicks, typing, etc.

The best GUIs make great use of muscle memory, which makes them perfect candidates for scripting via CLI. eg a simple sequence "open Notes app, hit Cmd+F, enter search term, read list of results" can be one Bash command invoked by AI agent.

zhxiaoliang 10 hours ago||
I'm always skeptical of the whole "computer use" concept. It's like hiring someone and inviting him to your house and telling him to go ahead, feel free to sleep on the bed, use the toilet, eat whatever is in the fridge, watch the TV, and oh here are the combinations for the safe... and that someone you hire is a monkey.
titzer 9 hours ago||
I feel like I am taking crazy pills. Are we really having an AI fart around with a mouse and clicking on things to accomplish stuff because we're not capable of making one kind of software query and command another piece of software? It kind of boggles my mind.
hnav 9 hours ago|||
The writing was on the wall with the MCP->CLI jump. The promise to investors is that you're replacing people. People don't make API calls.
ex-aws-dude 8 hours ago|||
You are because that requires you to expose that API for every single piece of software ever
eddythompson80 10 hours ago|||
But think of how comfortable and productive the monkey will feel. It might not be that hard to just build temp housing for it while you have monkey business to do.
andrekandre 9 hours ago||

  > build temp housing for it
everyone knows the real trouble starts when the monkey asks for the vote
nijave 10 hours ago||
In fairness, you're hoping the monkey does all the monkey tasks you'd rather not do yourself
mbgerring 8 hours ago||
Hello from the distant past, when being able to easily consume a website via API was an exciting and fresh idea for humans, before robots could effectively use the computer

https://en.wikipedia.org/wiki/HATEOAS

mbgerring 8 hours ago|
Does anyone remember the conference talk in the early days of React that was titled something like “best practices considered harmful,” or something? Or maybe that was a joke someone made about it. Anyway, the Semantic Web people have been right this whole time, and it’s very funny that we can now quantify the cost of building websites upside down and backwards for more than a decade.
rahulyc 14 hours ago|
All the websites currently blocking Claude Code or other AI agents are fighting a losing battle. Computer-use is in the early stages, and the thing preventing mass-adoption seems to be the number of tokens it takes. Agents can fumble around trying 10 CLI commands that don't work before finding the right one and we barely notice. But other visual agents (browser use / computer use etc) end up eventually fumbling on to the right thing, but we don't have the patience to wait 20 mins. to click a button. As tokens get cheaper + faster, we probably get the models that can use a UI interface just as natively as a CLI.
boringg 14 hours ago||
Tokens cheaper? I don't think that seems to be the case ... VC funded tokens were there to build user base and token price will go up as they eventually switch from growth to profitability.
Aurornis 14 hours ago|||
I wish I could place a lot of money on the opposite side of this bet.

I don't think many realize how could the cheap, alternative models are becoming. I prefer SOTA models for key work, but I can also spend 10X as many tokens on an open model hosted by a non-VC subsidized provider (who is selling at a profit) for tasks that can tolerate slightly less quality.

The situation is only getting better as models improve and data centers get built out.

caughtinthought 13 hours ago|||
What open source model and what non-subsidized provider specifically?
nijave 10 hours ago||
GLM 4.7 Flash is 0.07/1m tokens in, 0.40/1m tokens out on AWS Bedrock us-east-1. That's less than 1/10 the price of Haiku 4.5

Bedrock isn't the cheapest either although I'm fairly sure they aren't being VC subsidized

There are definitely cheap tokens out there. The big gotcha is "for tasks that can tolerate slightly less quality"

EduardoBautista 13 hours ago||||
Yes, but how cheap is it to run four at the same time? It’s tough to run one good model locally, but running four at the same time which I commonly do with Claude and Codex just doesn’t seem to be happening anytime soon.
Aurornis 11 hours ago||
I'm referring to hosted models such as via OpenRouter or from the model providers' own services.

I think everyone making claims that inference is getting more expensive are unaware that there are more LLM providers than Google, Anthropic, and OpenAI.

boringg 14 hours ago|||
Fair - there are bets both ways though I wouldn't consider it to be a certainty. That revenue drive on this AI build out is going to be real and multifold.
bheadmaster 14 hours ago|||
It will take a few years until scheduled data center construction finishes, and together with software optimizations that may come up in the meantime, it may cause a significant decrease in token price.
johnsmith1840 14 hours ago|||
And the lethal trifecta but I suppose that's all agents as of now anyhow. Every AI provider has major warnings about letting AI have access to PII in the browser.
faangguyindia 10 hours ago|||
nobody can block actual LLM providers, they use spoofed requests to scan web for content, sometimes even using residential proxies.
nijave 10 hours ago||
Sure they can, proof of work seems to be effective. Anubis has become pretty popular
ls612 12 hours ago|||
They don’t need to be 100% effective they just need to make you afraid enough of being banned to not bother trying.
octoberfranklin 6 hours ago||
How do they know that the "you" accessing the site is the same "you" they previously banned?

Face-scanning? Iris patterns?

ls612 5 hours ago||
You used your credit card to buy whatever service or product they sell.
octoberfranklin 4 hours ago||
I hate to break it to you but it is really easy to get anonymous visa/mastercard cards.
jasomill 2 hours ago||
And easy to identify anonymous cards.

https://www.google.com/search?q=identify+anonymous+visa+mast...

einpoklum 13 hours ago||
> the thing preventing mass-adoption seems to be the number of tokens it takes.

Try the exhorbitant expenses and ballooning waste of generated electricity and usable water.

More comments...