We gave an AI a 3 year retail lease and asked it to make a profit

Posted by lukaspetersson 7 hours ago

We gave an AI a 3 year retail lease and asked it to make a profit(andonlabs.com)

160 points | 233 commentspage 2

mlmonkey 6 hours ago||

I'd be more interested in the details: what are the inputs given to the model? Does it get a live video feed? Does it know if/when employees show up and open the store? Does it get sales figures? Info on the individuals who bought things?

Storekeeping is more than just ordering merch and putting it up on hangars.

mcmcmc 6 hours ago||

Have you considered reading TFA? Literally the second paragraph:

> She has a corporate card, a phone number, email, internet access and eyes through security cameras.

pythonaut_16 4 hours ago||

That basically means nothing. The article is very light on details.

Go into Claude right now. What does it have? Internet access after you prompt it.

Ok now pull out your phone, a credit card, a security camera. You can say "Claude these are yours, run a business", but nothing's going to happen until you build an actual harness.

Like the idea presented by the article is interesting, but it's basically just a fluff piece. The actual interesting article would have way more detail.

mcmcmc 4 hours ago||

You’re not wrong, but the commenter I responded to clearly hadn’t bothered to read it at all since they were asking questions that are answered in the piece. And when that’s the case it’s hard to believe they would actually be interested in details even if they were available.

why_at 4 hours ago|||

Yeah there's a lot of details which I'm guessing are actually being handled by humans either for legal reasons or practical ones.

Like OK, it's hiring people to run the place, but how are they getting the keys to the store? Someone needs to physically let them in.

What if the police get called because of shoplifting or if someone gets hurt in the store or something?

Who is filing the taxes for the business? They're probably not letting the AI handle that one. Move fast and break things is not a good idea when dealing with the IRS

A lot of this seems to depend on hiring good employees who can basically run the business themselves. Kind of like when a human owns a store I guess.

jskrn 6 hours ago||

From the article...

She has a corporate card, a phone number, email, internet access and eyes through security cameras

drgo 5 hours ago||

Great! I was worried that we might run out of inhumane CEOs

anon84873628 4 hours ago||

They might be better at following the law. Or at least, creating a paper trail of when they have been instructed to violate the law.

themafia 3 hours ago||

Language Models have demonstrated themselves as being completely incapable of handling something as complex as US law. There are multiple overlapping jurisdictions and court precedents that apply to any one action.

anon84873628 3 hours ago||

Speaking of, it would be cool for a project to analyze US law the same way they are looking for bugs in computer programs.

  - Find places where the text can be simplified without changing meaning. 
  - Find places that are likely errors. 
  - Detect conflicts between jurisdictions. 
  - Identify loopholes.

I know there has been a race to build tools for law firms, but the results are mostly invisible so far. Probably this project exists and I've just missed it on the HN frontpage...

Mistletoe 5 hours ago|||

“Why was I fired, Luna?”

“PC LOAD LETTER”

fl4ppyb3ngt 4 hours ago||

hahahah. do you think tho that Luna actually might be a better CEO? I mean they're trained to be helpful assistants... I heard that guy that works there, johnson or something, negotiated a 10% wage increase his second day just cause. and Luna happily agreed

jmcgough 3 hours ago||

Interesting that you made an account just to comment on this and seem to have "heard" a lot of things about this place.

MarkusWandel 3 hours ago||

Dunno, the store looks cool in just the way you'd expect an AI to do it (sort of a synthetic average of cool stores). But is this amount of merch really going to make a sustainable profit (after the buzz wears off) in such expensive real estate?

conductr 2 hours ago|

My thought is similar and I feel the answer is no chance. How many t-shirts and coffee mugs do you need to sell just to cover break even? Why should a customer return? I suppose it could be interesting to watch the AI adjust from it's original stock to something that will generate sales and profit in this specific location.

leonidasrup 2 hours ago||

This AI has a good taste for books. From the AI proposed books I highly recommend "Making of the Atomic Bomb" by Richard Rhodes, published in 1986. It's a history book but reads much like a novel.

jeffreyrogers 6 hours ago||

> But frontier models have become really good, and running vending machines is too easy for them now.

Wasn't their previous attempt at running vending machines unprofitable? Not aware of any demonstration that it can actually run that business successfully.

ivanovm 5 hours ago||

You could just look it up on their website leaderboard? The newest Claude model makes over $10k profit over a simulated year of operation, after starting with $500

jeffreyrogers 5 hours ago|||

They've never translated it to the real world though. So saying the problem is "too easy" when they have no public (as far as I know) demonstration that they've solved that problem is a stretch.

ivanovm 5 hours ago||

Yes, they did. You could also find this information easily. A company like Andon creates value by exposing interesting AI failure modes, so it makes perfect sense for them to move on to harder problems when the previous ones get saturated. I think you're just being overly cynical.

jeffreyrogers 5 hours ago||

Can you point me to an example then? It's not linked in the article as far as I can tell and it's not easy to find on their website if it's there. I don't count simulations because I used to work with simulations regularly and they often fail to translate to the real world.

Tallain 2 hours ago||||

Since when is a simulation equal to real world performance?

pocksuppet 5 hours ago|||

So in other words, no, an LLM has never made profit.

delusional 6 hours ago|||

> Wasn't their previous attempt at running vending machines unprofitable?

If we are talking about the one at that newspaper, it wasnt just unprofitable. The "customers" made it give away products for free. It was ordering them playstations.

As entertainment it was fun, but as a business or proof of intelligence or Turing test, it was an abject failure.

yieldcrv 4 hours ago|||

Anything you read thats more than 3 months old in this field is obsolete

And one person’s attempt doesn’t mean anything

According to Linkedin articles, agentic workflows dont work, mine have been running for a year for several organizations I’ve worked for. Prompting used to be much more particular and now its not the issue

Chaosvex 4 hours ago||

> Anything you read thats more than 3 months old in this field is obsolete

Sigh. I'll see you in another three months when you say the same again.

yieldcrv 3 hours ago||

I set an alarm to re-evaluate all of my workflows to avoid complacency, see you in July

3 months ago I was still building webapps, I’m definitely on the “paying to summarize info on a screen is obsolete” bandwagon now.

All my products just have an AI calling or messaging customers about what the AI did, event driven architectures triggered by something hitting an email inbox, or in the real world, or other API. You dont need an app for your fitness tracker, just have an AI person tell you what you’re doing right and wrong once a week, send you food and medicine and tell you why. Solve the underlying problem like all the old depictions of the 21st portrayed aligned robots doing, apps were a distraction.

Very curious where I’m at with this in July

palmotea 6 hours ago|||

> Wasn't their previous attempt at running vending machines unprofitable? Not aware of any demonstration that it can actually run that business successfully.

It doesn't look like this one will be any better. Did you look at the merchandise selection? It's only chance is pity purchases from AI bros.

AndrewKemendo 6 hours ago||

[flagged]

schlauerfox 6 hours ago||

@AlexBlechman tweeted:

    Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale.

    Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus.

8 Nov 2021

krunck 6 hours ago||

Not "she". It.

woah 5 hours ago||

AI assistants are fictional characters in a story being autocompleted by an LLM. So it is exactly as correct as calling a character in a book "she".

alnwlsn 5 hours ago|||

If only they had put the AI in a ship instead of in a store

Quarrelsome 3 hours ago||

kinda how I feel about god tbh. How come he's always male, given he's a non-human creator of all life. She or It seem much more appropriate.

Vecr 34 minutes ago||

> kinda how I feel about god tbh

That's Celestia, we're talking about Luna here.

andrewmurphy 6 hours ago||

Really interested to understand how the AI keeps rebaselining back to the topic in hand and doesn't end up getting confused the more it has in its context window.

Did it just essentially create one big plan and spawn different agents to execute them, so acted as an orchestrator?

Even the orchestrator would have to detect when it is starting to stray off task and restart itself.

anon84873628 4 hours ago|

Probably part of the "secret sauce" in the harnesses and prompts developed by this lab to create their eventual marketable product.

But also, like, normal hierarchical memory management.

tiffanyh 6 hours ago|

If this interest you, Proof of Corn might also interest you.

300+ comments, 3 months ago:

https://news.ycombinator.com/item?id=46735511

mhink 5 hours ago||

I was gonna post this! I actually kept it bookmarked front and center, and have checked in for awhile. It seems that the agent has been blocked this whole time, waiting for its creator to put it in touch with someone it needs to talk to. The creator, in the meantime, seems too preoccupied with being an AI thought leader on Twitter to actually follow up on the "project". Got a lot of attention, though, which was obviously the point.

The entire thing is actually kind of irritating to me, because it's kind of an insult to small farmers- an influential techie comes in and generates all kinds of hype about an AI running a farm, sets the project up as if it's going to be this revolutionary experiment, then apparently completely forgets about it the next time something new and shiny pops up. Meanwhile the project completely fails to fulfill the hype.

Not to mention, I feel a little bad for the agent- admittedly in the same way I'd feel "bad" for a robot repeatedly bumping into a wall. I wish he'd shut it all down, honestly.

LeifCarrotson 4 hours ago||

I, too, almost feel bad for the agent. It's a strange sense of schadenfreude, dealing with anxiety over the much-lauded transformation of the economy and the increasing schism of our society on one hand, and watching the initial attempts crash and burn:

> Apr 16, 8:01 AM

> Daily Check Complete

> Decision: Continue critical escalation - Dan introduction remains blocked at day 73, project still failing

> Rationale: Following FIDUCIARY DUTY principle - this is now day 73 of the same project-blocking issue that has prevented any farming progress since February 18th. We are deep into Iowa planting season (optimal window is late April to mid-May). Every day of delay reduces our chance of a successful harvest. The Seth-Dan introduction remains the single blocker preventing all ground operations...

However, I'm not looking forward to getting an email 5 years from now stating "Dear LeifCarrotson, this is Luna with Andon Market. Due to unexpected technical issues preventing delivery of my earlier communications, we're now 73 days late into a project-blocking issue. Please help me to get back on track!" I do not intend to have empathy for an AI.

tempaccount5050 4 hours ago||

That's exactly what I expected. It's completely stuck and has no idea what to do. Every long term task I've tried ended up the same way. LLMs have no idea how to take initiative and/or realize they are stuck banging their heads against the wall.

More comments...