Top
Best
New

Posted by speckx 5 days ago

Prepare for That Stupid World(ploum.net)
174 points | 94 commentspage 2
mlsu 5 days ago|
This piece is pretty ineffective. Not that I like the world of "AI", I probably share the author's opinion that its just another evolution in the bullshittification of the human experience.

But, the point of the article is not that you would implement an agent based vending machine business. Humans restock the machine because its a red-team exercise. As a red-team exercise it looks very effective.

> Why do you ever want to add a chatbot to a snack vending machine? The video states it clearly: the vending machine must be stocked by humans. Customers must order and take their snack by themselves. The AI has no value at all.

Like this is like watching the simpsons and being like "why are the people in the simpsons yellow? people in real life aren't yellow!!"

The point isn't to run a profitable vending machine, or even validate that an AI business agent could become profitable. The point is to conduct an experiment and gather useful information about how people can pwn LLMs.

At some level the red team guy at Anthropic understands that it is impossible by definition for models to be secure, so long as they accept inputs from the real world. Putting instructions into an LLM to tell it what to do is the equivalent of exposing an `eval()` to a web form: even if you have heuristics to check for bad input, you will eventually be pwned. I think this is actually totally intractable without putting constraints on the model from outside. You'll always need a human in the loop to pull the plug on the vending machine when it starts ordering playstations. The question is how do you improve that capability, and that is the anthropic red-team guy's job.

layer8 5 days ago|
> The point isn't to run a profitable vending machine, or even validate that an AI business agent could become profitable.

Having an AI run an organization autonomously is exactly the point of Andon Labs [0], who provided the system that WSJ tested.

[0] https://andonlabs.com/

mrandish 5 days ago||
I read that WSJ article before seeing this blog post. I found it mildly interesting and a little bit funny but unsurprising that the AI failed. However, I think this blog about the article misses a key point. Anthropic's goal was never to develop an AI-based vending machine. The WSJ clearly says:

> "Logan Graham, head of Anthropic’s Frontier Red Team, told me the company chose a vending machine because it’s the simplest real-world version of a business. “What’s more straightforward than a box where things go in, things go out and you pay for them?” he said."

This was a project of Anthropic's Red Team, not a product development team. Deploying the AI in a vending machine context was chosen as a minimal "toy model" with which to expose how LLMs can't even handle a grossly simplified "business" with the fewest possible variables.

> "That was the point, Anthropic says. The Project Vend experiment was designed by the company’s stress testers (aka “red team”) to see what happens when an AI agent is given autonomy, money—and human colleagues."

Anthropic had already done this experiment internally and it succeeded - by failing to operate even the simplest business but doing so in ways that informed Anthropic's researchers about failure modes. Later, Anthropic offered to allow the WSJ to repeat the experiment, an obvious PR move to promote Anthropic's AI safety efforts by highlighting the kinds of experiments their Red Team does to expose failure modes. Anthropic knew it would fail abjectly at the WSJ. The whole concept of an AI vending machine with the latitude to set prices, manage inventory and select new products was intended to be ludicrous from the start.

spit2wind 5 days ago||
Excuse me if someone already asked and I missed it: how does one prepare for such a world?

Is it some Viktor Frankl level acceptance or should I buy a copy of the Art of Electronics or what?

Advice welcome.

chunkmonke99 5 days ago|
I don't think there is anything more than the standard advice. Just stay curious, make friends/build a community, keep learning, stay healthy. Why not get the AoE? you can also, check out "Practical Electronics for Inventors": AoE assumes you have some Electronics background imo. But seriously, I don't get the doom/gloom: things are going to be rough ... but maybe they won't? Many things I learned I did for their own sake! Things have always been uncertain and absurd I guess we might as well embrace it!
conorcleary 5 days ago||
(also get age of empires)
chunkmonke99 4 days ago||
Hahah absolutely!! Man that brings back memories.
brador 5 days ago||
It was always tasks reaching obsolescence, but now it’s the human organism. But the human as a unit is the only known conscious being in the universe, the only entity capable of generating meaningful goals (even if only to them) not related to the 4fs.

Humans were just not needed anymore, and it terrifies.

sallveburrpi 5 days ago||
Other beings than humans have demonstrated consciousness and “meaningful” goals besides humans. Crows for instance, but there are many others.

Humans were never needed (for what?)

neogodless 5 days ago||
What are "4fs"? Is that the "4X" e.g. games where you eXplore, eXpand, eXploit, eXterminate?
stryan 5 days ago||
The four basic actions in evolutionary biology: Feeding, Fleeing, Fighting, "Mating".
snickerbockers 5 days ago||
Someday the mcdonalds kiosk will want to be your friend. It will remember who you are and ask you how your kids are doing. It will recommend new specials and maybe even give you "specials friend" deals. And I'll just tell it to shut the fuck up and queue me an order for the egg mcmuffin combo with a coffee and the fried potato patty because this bullshit is fucking obnoxious.
sanbor 5 days ago||
I have a different point of view. This was a test to see if the AI could perform a specific task. Asking AI to draw a pelican riding a bike is another test. I find the experiment interesting because it proves that currently LLMs are not able to perform a simple task reliably for a long period of time.

If the journalist was not asking the right questions, or was too obvious the article was PR it’s another thing (I haven’t read WSJ’s piece, only the original post by Anthropic)

ursAxZA 5 days ago||
If vending machines are the benchmark now, the logical next step is obvious: let AI run AI.
littlecranky67 5 days ago||
I had recently contact the official support email (support@bunq.com) of Bunq - a Neobank (like N26 and Revolut). Because they notified me that they changed their T&C and I never really used the account after the kyc (because they rejected my tax filings), I figured I let them know that I do not agree to the new T&C and want to terminate my account and have my data deleted.

Since the T&C update came - of course - from no-reply@bunq.com I went to their website and quickly found out, unless I install their App again, there is no way to do anything. After installing the App, they wanted me to record a selfie, because I was using the app from a new device. I figured that is a lot of work and mostly somewhat unreasonable to record a new selfie just to have my data deleted - so I found their support@bunq.com address.

And, of course, you guessed it, it is 100% a pure AI agent at borderline retard level. Even though it is email, you get AI answers back. My initial inquiry that I decline the T&C and want to terminate my account and my data deleted via GDPR request was answered with a completely hallucinated link: bunq.com/dataprotection which resulted in immediate 404. I replied to that email that it is a 404, and the answer was pretty generic and that - as well as all responses seem to be answered in 5 minutes - made me suspect it is AI. I asked it what 5 plus five 5 is, and yes, I got a swift response with the correct answer. My question which AI version and LLM was cleverly rejected. Needless to say, it was completely impossible to get anything done with that agent. Because I CC'ed their privacy officer (privacy@bunq.com) I did get a response a day later asking me basically for everything again that I had answered to the AI agent.

Now, I never had any money in that account so I don't care much. But I can hardly see trusting a single buck to a bank that would offer that experience.

kittikitti 5 days ago|
This is a great take and one that I align with when it comes to the AI vending machine experiment. Journalism in English has become a mouthpiece for fascist leaders and corporations, nothing more. Places like The New York Times have incredible gaps in their journalism at the price of increasing shareholder value.
More comments...