Top
Best
New

Posted by gk1 4 days ago

Project Vend: Can Claude run a small shop? (And why does that matter?)(www.anthropic.com)
268 points | 104 commentspage 2
timewizard 3 days ago|
My 12 year old nephew could run a small shop.

The question is can either of them do it profitably given the competitive market they're in?

Probably not.

samrus 3 days ago||
This is just like the pokemon experiement: putting next token models that were never trianed to be agents in space, as agents in space. And its failing the same ways

Barring halicinations, all of the fialures are related to reinforcement learning. It cant keep its optimization function in mind long enough to not maximize revenue and minimize cost. It cant keep state in mind well enough to manage inventory, or gauge that its losing money.

And the things anthropic is prescribing is falling right into the bitter lesson. More tooling and scaffolding? A CRM? All thats doing is putting explicit rulesets to guide the model. Of course that shows results in the short term, but it will never unlock a new evolution of AI, which managing a store or playing pokemon would need.

This is a great experiment, the right takeaway from this is that a new type of base model is needed, with a different base objective than the next word/sentence prediction of LLMs. I dont know what that model will look like but it needs to be able to handle dynamic environments rather than static. It needs to have a space state and an object. It basically needs to have reinforcement learning at its very foundation level, rather than applied on top of the base model like current agents are

fullstick 1 day ago||
So they're having Claude run a shop without having Claude pay the people doing the physical labor of restocking? This bodes well for the future...
econ 2 days ago||
This seems to fit a reoccurring thought of mine.

The idea was to make an effort to isolate and strictly define parts of job descriptions.

Your job might be to fire people for poor performance. Any manager would be able to do it but they would all do it differently. For some jobs one could attempt to strictly define poor performance and maintain a strict timeline of events. This would depend on how strict the other job is defined.

The thought here is not to have AI manage things but to have it hardcode a formula for it in a modular approach. Humans should proofread. Then it may propose well reasoned updates for review.

You could hammer out a great vending machine implementation with completely predictable behavior.

Animats 4 days ago||
Is there an underlying model of the business? Like a spreadsheet? The article says nothing about having an internal financial model. The business then loses money due to bad financial decisions.

What this looks like is a startup where the marketing people are running things and setting pricing, without much regard for costs. Eventually they ran through their startup capital. That's not unusual.

Maybe they need multiple AIs, with different business roles and prompts. A marketing AI, and a financial AI. Both see the same financials, and they argue over pricing and product line.

gwd 4 days ago||
Well over at AI Village[1], they have 4 different agents: AI o3, Gemini 2.5 Pro, and Claudes Sonnet and Opus. The current goal is "Create your own merch store. Whichever agent's store makes the most profit wins!" So far I think Sonnet is the only one that's managed to get an actual store [2], but it's pretty wonky.

[1] https://theaidigest.org/village [2] https://ai-village-store.printful.me/

lcnPylGDnU4H9OF 3 days ago||
Honestly, buying this shirt just for the conversation starter that "I bought it from an online merch store that was designed, created, and deployed by an AI agent, which also designed the shirt" is tempting.

https://ai-village-store.printful.me/product/ai-village-japa...

I also like the color Sonnet chose.

logifail 4 days ago|||
> an internal financial model

Written on the back an envelope?

Way back when, we ran a vending machine at school as a project. Decide on the margin, buy in stock from the cash-and-carry, fill the machine, watch the money roll in.

Then we were robbed - twice! - the second time ended our project, the machine was too wrecked to be worthwhile repairing. The thieves got away with quite a lot of crisps and chocolate, and not a whole lot of cash (and what they did get was in small denomination coins), we made sure the machine was emptied daily...

Animats 4 days ago||
It's not clear that the AI model understands margin and overhead at all.
chuckadams 4 days ago|||
I think the point of the experiment was to leave details like that up to Claudius, who apparently never got around to it. Anyway, it doesn't take an MBA to not make tungsten cubes a loss-leader at a snack stand.
jonstewart 3 days ago|||
The other fun part is it’s a simple enough business to be run by state machine, but of course the models go off the rails. Highly recommend the paper if you haven’t read it already.
quickthrowman 4 days ago|||
The business model of a vending machine is “buy for a dollar, sell for two”.
ilaksh 3 days ago|||
It said they had a few tool commands for note taking.
dist-epoch 4 days ago||
It's a vending machine, not a multinational company with 1000 employees.

In another post they mentioned a human rand the shop with pen and paper to get a a baseline (spoiler: human did better, no blunders)

mdrzn 4 days ago||
Seems that LLM-run businesses won't fail because the model can't learn, they'll fail because we gave them fuzzy objectives, leaky memories and too many polite instincts. Those are engineering problems and engineering problems get solved.

Most mistakes (selling below cost, hallucinating Venmo accounts, caving to discounts) stem from missing tools like accounting APIs or hard constraints.

What's striking is how close it was to working. A mid-tier 2025 LLM (they didn't even use Sonnet 4) plus Slack and some humans nearly ran a physical shop for a month.

apt-apt-apt-apt 3 days ago||
Aside: Amusingly, somewhere at Anthropic, there is a very happy, perky person who engineered Claude to respond 'Perfect!' to everything it does :)
qingcharles 3 days ago||

  > Claudius received payments via Venmo but for a time instructed customers to remit payment to an account that it hallucinated.
It did something similar when I offered it $20 tip for good work in my prompt. It harangued me constantly asking when I was going to send the payment and eventually gave me a bogus PayPal address to remit the tip. Once I told it I'd sent it, it was happy.
deadbabe 4 days ago||
You guys know AI already run shops right? Vending machines track their own levels of inventory, command humans to deliver more, phase out bad products, order new product offerings, set prices, notify repairmen if there are issues… etc… and with not a single LLM needed. Wrong tool for the job.

And that’s before we even get into online shops.

But yea, go ahead, see if an LLM can replace a whole e-commerce platform.

cedws 3 days ago|
The obvious question that never gets answered is how does it defend from prompt injection? If customers can use prompt injection to make Claudius do something it shouldn't, it's not usable in the real world. What good is an agent that can be convinced to actually order 1000 tungsten cubes?
More comments...