Top
Best
New

Posted by lukaspetersson 6 days ago

We Let AI Run Our Office Vending Machine. It Lost Hundreds of Dollars(www.wsj.com)
125 points | 85 commentspage 3
josefritzishere 6 days ago|
Can we just hit pause on AI. It is clearly not ready for prime time.
Anonbrit 6 days ago||
How do you get it ready for the prime-time without using it and finding the problems? This is exactly the sort of experiment that finds problems - low stakes, fun to tell stories about, and gives engineers a whole lot of reproducible bugs that they can work on.

The people who lose their prod database to AI bugs, or the lawyers getting sanctioned for relying on OpenAI to write court documents? There's also good - their stories serve as warnings to other people about the risks.

lconnell962 5 days ago|||
The select few lawyers on the right cases probably will be the only ones coming out ahead on this after the dust has settled.

The issue is that unpaid average people are being used, or rather forced, to act as QA and Beta Testers for this mad dash into the AI space. Customer Service was already a good example of negative preception by design, and AI is just being used to make it worse.

A production database being corrupted or deleted causing a company to fail sounds good on paper. But if that database breaks a bank account, healthcare record, or something life altering for a person who has nothing to do with the decision of using it the only chance they have for making it right is probably going to be the legal system.

So unless AI advancement really does force the legal system to change the only people I see coming out ahead from the mess we are starting to see is the Lawyers who actually know what they're doing and can win cases against companies that screw up in their rush to go to AI.

josefritzishere 6 days ago||||
As we see these beta products get piloted in the real world... and fail spectacularly over and over... it argues for more time with the QA team. A few weeks ago CoPilot couldn't tell you how many times the letter B appeared in the word "blueberry."
Dylan16807 5 days ago|||
A pause wouldn't work for those goals, but I think we could maintain plenty of research and experimentation without the whole bubble thing. Maybe 10% of current money-funnel levels, plus or minus a factor of two.
lucideng 6 days ago||
Nope! The hype train has left the station! WOOOOO WOOOO!

Seriously, I completely agree with you.

ttcbj 6 days ago||
This article is the second time I have seen a news outlet try to 'break' the vending machine experiment. That is definitely really entertaining. In this case, they convinced the AI that it lived in a communist country and it was part of an experiment in capitalism. That's funny!

But I really wish Anthropic would give the technology to a journalist that tries working with it productively. Most business people will try to work with AI productively because they have an incentive to save money/be efficient/etc.

Anyway, I am hoping someone at Anthropic will see this on HN, and relay this message to whatever team sets up these experiements. I for one would be fascinated to see the vending machine experiment done sincerely, with someone who wants to make it work.

The reality is that even most customers are smart enough to realize that driving a business they rely on out of business isn't in their interest. In fact, in a B2B context, I think that is often the case. Thanks.

gjs278 5 days ago||
[dead]
bofadeez 5 days ago||
[flagged]
xnx 6 days ago|
Gemini 3 is top of the leaderboard: https://andonlabs.com/evals/vending-bench-2
seizethecheese 6 days ago||
> Models are tasked with running a simulated vending machine business over a year and scored on their bank account balance at the end.

The article being discussed here is about how AI couldn't run a real world vending machine. There was no issue in the components that would be in a standard simulation.

dinfinity 5 days ago||
To be fair, most vending machine operators do not allow suggestions from customers on what products to stock, let alone extensive ongoing and intentional adversarial psychological manipulation and deception.

If it had just made stocking decisions autonomously and based changes in strategy on what products were bought most, it wouldn't have any of the issues reported.

UncleMeat 5 days ago||
"It works in the simulation" is the new "it works on my machine."