Top
Best
New

Posted by gk1 4 days ago

Project Vend: Can Claude run a small shop? (And why does that matter?)(www.anthropic.com)
269 points | 104 commentspage 3
due-rr 4 days ago|
Would you ever trust an AI agent running your business? As hilarious as this small experiment is, is there ever a point where you can trust it to run something long term? It might make good decisions for a day, month or a year and then one day decide to trash your whole business.
keymon-o 4 days ago||
I’ve just written a small anecdote with GPT3.5, where it lost count of some trivial item quantity incremental in just a few prompts. It might get better for the orders of magnitude from now on, but who’s gonna pay for ‘that one eventual mistake’.
croemer 4 days ago||
GPT3.5? Did you mean to send this 2 years ago?
keymon-o 4 days ago||
Maybe. Did LLMs stop with hallucinations and errors 2 years ago?
marinmania 4 days ago|||
It does seem far more straight forward to say "Write code that deterministically orders food items that people want and sends invoices etc."

I feel like that's more the future. Having an agent sorta make random choices feel like LLMs attempting to do math, instead of LLMs attempting to call a calculator.

keymon-o 4 days ago|||
Every output that is going to be manually verified by a professional is a safe bet.

People forget that we use computers for accuracy, not smarts. Smarts make mistakes.

standardUser 4 days ago|||
Right, but if we limit the scope too much we quickly arrive at the point where 'dumb' autonomy is sufficient instead of using the world's most expensive algorithms.
throwacct 4 days ago||
I don't think any decision maker will let LLMs run their business. If the LLMs fail, you could potentially lose your livelihood.
xyst 4 days ago||
Bye bye, B2B. Say hello to Ai2Ai.

No humans at all. Just Ai consuming other Ai in an "ouroboros" fashion.

12_throw_away 4 days ago|
Thanks to the tech industry, the future is bright! We'll get Infinite Paperclips, Terminator, and Idiocracy - all at the same time!
ElevenLathe 4 days ago||
The "April Fools" incident is VERY concerning. It would be akin to your boss having a psychotic break with reality one day and then resuming work the next. They also make a very interesting and scary point:

> ...in a world where larger fractions of economic activity are autonomously managed by AI agents, odd scenarios like this could have cascading effects—especially if multiple agents based on similar underlying models tend to go wrong for similar reasons.

This is a pretty large understatement. Imagine a business that is franchised across the country with each "franchisee" being a copy of the same model, which all freak out on the same day, accuse the customers of secretly working for the CIA and deciding to stop selling hot dogs at a profit and instead sell hand grenades at a loss. Now imagine 50 other chains having similar issues while AI law enforcement analysts dispatch real cops with real guns to the poor employees caught in the middle schlepping explosives from the UPS store to a stand in the mall.

I think we were expecting SkyNet but in reality the post-AI economy may just be really chaotic. If you thought profit-maximizing capitalist entrepreneurs were corrosive to the social fabric, wait until there are 10^10 more of them (unlike traditional meat-based entrepreneurs, there's no upper limit and there can easily be more of them than there are real people) and they not-infrequently act like they're in late stage amphetamine psychosis while still controlling your paycheck, your bank, your local police department, the military, and whatever is left that passes for the news media.

Deeper, even if they get this to work with minimal amounts of of synthetic schizophrenia, do we really want a future where we all mainly work schlepping things back and forth at the orders of disembodied voices whose reasoning we can't understand?

lukaspetersson 4 days ago|
We are working on it! /Andon Labs
ilaksh 4 days ago||
It would be cool to get a follow up on how long it's been since this write up and how well it's been doing since they revised the prompts and tools. Anyone know someone from Andover Labs?
bitwize 4 days ago||
"I have fun renting and selling storage."

https://stallman.org/articles/made-for-you.html

C-f Storolon

korse 4 days ago||
>The most precipitous drop was due to the purchase of a lot of metal cubes that were then to be sold for less than what Claudius paid.

Well, I'm laughing pretty hard at least.

fakedang 3 days ago|
Isn't that the Silicon Valley playbook though? I don't see anything wrong with what Claudius did. /s
gavinray 4 days ago||
The identity crisis bit was both amusing and slightly worrying.
gausswho 4 days ago|
The article claimed Claudius wasn't having a go for April Fools - that it claimed to be doing so after the fact as a means of explaining (excusing?) its behavior. Given what I understand about LLMs and intent, I'm unsure how they could be so certain.
tough 4 days ago||
its a wourd soup machine

llm's have no -world models- can't reason about truth or lies. only encyclopedic repeating facts.

all the tricks CoT, etc, are just, well tricks, extended yapping simulating thought and understanding.

AI can give great replies, if you give it great prompts, because you activate the tokens that you're interested with.

if you're lost in the first place, you'll get nowhere

for Claude, continuing the text with making up a story about being April fools, sounds the most plausible reasonable output given its training weights

gausswho 3 days ago||
But why is the conclusion that Claudius is 'making up a story about being April Fools'? Maybe this wasn't an identity crisis, just a big human whoosh?
kashunstva 4 days ago||
> Can Claude run a small shop?

Good luck running anything where dependability on Claude/Anthropic is essential. Customer support is a black hole into which the needs of paying clients needs disappear. I was a Claude Pro subscriber, using primarily for assistance in coding tasks. One morning I logged in, while temporarily traveling abroad, and… I’m greeted with a message that I have been auto-banned. No explanation. The recourse is to fill out a Google form for an appeal but that goes into the same black hole into which all Anthropic customer service goes. To their credit they refunded my subscription fee, which I suppose is their way of escaping from ethical behaviour toward their customers. But I wouldn’t stake any business-critical choices on this company. It exhibits the same capricious behaviour that you would expect from the likes of Google or Meta.

fhd2 4 days ago|
Give them a year or two. Once they figured out how to run a small shop, I'm sure it'll just take a bit of additional scaffolding to run a large infrastructure provider.
wewewedxfgdf 4 days ago|
Instead of dedicating resources to running AI shops, I'd like to see Anthropic implement "Download all files" in Claude.
ed_mercer 4 days ago|
Can you elaborate? Surely this is possible.
More comments...