Posted by gk1 4 days ago
The question is can either of them do it profitably given the competitive market they're in?
Probably not.
Barring halicinations, all of the fialures are related to reinforcement learning. It cant keep its optimization function in mind long enough to not maximize revenue and minimize cost. It cant keep state in mind well enough to manage inventory, or gauge that its losing money.
And the things anthropic is prescribing is falling right into the bitter lesson. More tooling and scaffolding? A CRM? All thats doing is putting explicit rulesets to guide the model. Of course that shows results in the short term, but it will never unlock a new evolution of AI, which managing a store or playing pokemon would need.
This is a great experiment, the right takeaway from this is that a new type of base model is needed, with a different base objective than the next word/sentence prediction of LLMs. I dont know what that model will look like but it needs to be able to handle dynamic environments rather than static. It needs to have a space state and an object. It basically needs to have reinforcement learning at its very foundation level, rather than applied on top of the base model like current agents are
The idea was to make an effort to isolate and strictly define parts of job descriptions.
Your job might be to fire people for poor performance. Any manager would be able to do it but they would all do it differently. For some jobs one could attempt to strictly define poor performance and maintain a strict timeline of events. This would depend on how strict the other job is defined.
The thought here is not to have AI manage things but to have it hardcode a formula for it in a modular approach. Humans should proofread. Then it may propose well reasoned updates for review.
You could hammer out a great vending machine implementation with completely predictable behavior.
What this looks like is a startup where the marketing people are running things and setting pricing, without much regard for costs. Eventually they ran through their startup capital. That's not unusual.
Maybe they need multiple AIs, with different business roles and prompts. A marketing AI, and a financial AI. Both see the same financials, and they argue over pricing and product line.
[1] https://theaidigest.org/village [2] https://ai-village-store.printful.me/
https://ai-village-store.printful.me/product/ai-village-japa...
I also like the color Sonnet chose.
Written on the back an envelope?
Way back when, we ran a vending machine at school as a project. Decide on the margin, buy in stock from the cash-and-carry, fill the machine, watch the money roll in.
Then we were robbed - twice! - the second time ended our project, the machine was too wrecked to be worthwhile repairing. The thieves got away with quite a lot of crisps and chocolate, and not a whole lot of cash (and what they did get was in small denomination coins), we made sure the machine was emptied daily...
In another post they mentioned a human rand the shop with pen and paper to get a a baseline (spoiler: human did better, no blunders)
Most mistakes (selling below cost, hallucinating Venmo accounts, caving to discounts) stem from missing tools like accounting APIs or hard constraints.
What's striking is how close it was to working. A mid-tier 2025 LLM (they didn't even use Sonnet 4) plus Slack and some humans nearly ran a physical shop for a month.
> Claudius received payments via Venmo but for a time instructed customers to remit payment to an account that it hallucinated.
It did something similar when I offered it $20 tip for good work in my prompt. It harangued me constantly asking when I was going to send the payment and eventually gave me a bogus PayPal address to remit the tip. Once I told it I'd sent it, it was happy.And that’s before we even get into online shops.
But yea, go ahead, see if an LLM can replace a whole e-commerce platform.