"Token anxiety", a slot machine by any other name

Posted by presbyterian 4 hours ago

"Token anxiety", a slot machine by any other name(jkap.io)

76 points | 61 comments

ctoth 2 hours ago|

The gambling analogy completely falls apart on inspection. Slot machines have variable reward schedules by design — every element is optimized to maximize time on device. Social media optimizes for engagement, and compulsive behavior is the predictable output. The optimization target produces the addiction.

What's Anthropic's optimization target??? Getting you the right answer as fast as possible! The variability in agent output is working against that goal, not serving it. If they could make it right 100% of the time, they would — and the "slot machine" nonsense disappears entirely. On capped plans, both you and Anthropic are incentivized to minimize interactions, not maximize them. That's the opposite of a casino. It's ... alignment (of a sort)

An unreliable tool that the manufacturer is actively trying to make more reliable is not a slot machine. It's a tool that isn't finished yet.

I've been building a space simulator for longer than some of the people diagnosing me have been programming. I built things obsessively before LLMs. I'll build things obsessively after.

The pathologizing of "person who likes making things chooses making things over Netflix" requires you to treat passive consumption as the healthy baseline, which is obviously a claim nobody in this conversation is bothering to defend.

CGMthrowaway 2 hours ago||

> The gambling analogy completely falls apart on inspection. Slot machines have variable reward schedules by design — every element is optimized to maximize time on device. Social media optimizes for engagement, and compulsive behavior is the predictable output. The optimization target produces the addiction.

Intermittent variable rewards, whether produced by design or merely as a byproduct, will induce compulsive behavior, no matter the optimization target. This applies to Claude

Aurornis 35 minutes ago|||

> Intermittent variable rewards, whether produced by design or merely as a byproduct, will induce compulsive behavior, no matter the optimization target.

This is an incorrect understanding of intermittent variable reward research.

Claims that it "will induce compulsive behavior" are not consistent with the research. Most rewards in life are variable and intermittent and people aren't out there developing compulsive behavior for everything that fits that description.

There are many counter-examples, such as job searching: It's clearly an intermittent variable reward to apply for a job and get a good offer for it, but it doesn't turn people into compulsive job-applying robots.

The strongest addictions to drugs also have little to do with being intermittent or variable. Someone can take a precisely measured abuse-threshold dose of a drug on a strict schedule and still develop compulsions to take more. Compulsions at a level that eclipse any behavior they'd encounter naturally.

Intermittent variable reward schedules can be a factor in increasing anticipatory behavior and rewards, but claiming that they "will induce compulsive behavior" is a severe misunderstanding of the science.

ctoth 2 hours ago||||

Sometimes I will go out and I will plant a pepper plant and take care of it all summer long and obsessively ensure it has precisely the right amount of water and compost and so on... and ... for some reason (maybe I was on vacation and it got over 105 degrees?) I don't get a good crop.

Does this mean I should not garden because it's a variable reward? Of course not.

Sometimes I will go out fishing and I won't catch a damn thing. Should I stop fishing?

Obviously no.

So what's the difference? What is the precise mechanism here that you're pointing at? Because sometimes life is disappointing is a reason to do nothing. And yet.

roblh 1 hour ago|||

It's a not a binary thing, it's a spectrum. There are many elements of uncertainty in every action imaginable. I'm inclined to agree with the other commenter though, the LLM slot machine is absolutely closer on that spectrum to gambling than your example is.

Anthropic's optimization target is getting you to spend tokens, not produce the right answer. It's to produce an answer plausible enough but incomplete enough that you'll continue to spend as many tokens as possible for as long as possible. That's about as close to a slot machine as I can imagine. Slot rewards are designed to keep you interested as long as possible, on the premise that you _might_ get what you want, the jackpot, if you play long enough.

Anthropic's game isn't limited to a single spin either. The small wins (small prompts with well defined answers) are support for the big losses (trying to one shot a whole production grade program).

Aurornis 34 minutes ago|||

> Anthropic's optimization target is getting you to spend tokens, not produce the right answer.

The majority of us are using their subscription plans with flat rate fees.

Their incentive is the precise opposite of what you say. The less we use the product, the more they benefit. It's like a gym membership.

I think all of the gambling addiction analogies in this thread are just so strained that I can't take them seriously. Even the basic facts aren't even consistent with the real situation.

8note 1 hour ago||||

im on a subscription though.

they want me to not spend tokens. that way my subscription makes money for them rather than costing them electricity and degrading their GPUs

sweetjuly 31 minutes ago||

Wouldn't that apply only to a truly unlimited subscription? Last I looked all of their subs have a usage limit.

If you're on anything but their highest tier, it's not altogether unreasonable for them to optimize for the greatest number of plan upgrades (people who decide they need more tokens) while minimizing cancellations (people frustrated by the number of tokens they need). On the highest tier, this sort of falls apart but it's a problem easily solved by just adding more tiers :)

Of course, I don't think this is actually what's going on, but it's not irrational.

pixl97 1 hour ago|||

> you'll continue to spend as many tokens as possible for as long as possible.

I mean this only works if Anthropic is the only game in town. In your analogy if anyone else builds a casino with a higher payout then they lose the game. With the rate of LLM improvement over the years, this doesn't seem like a stable means of business.

outofpaper 2 hours ago|||

??? I'm pretty sure you know what the differences are. Go touch grass and tell me it's the same as looking at a plant on a screen.

Dealing with organic and natural systems will, most of the time, have a variable reward. The real issue comes from systems and services designed to only be accessible through intermittent variable rewards.

Oh, and don't confuse Claude's artifacts working most of the time with them actually optimizing to be that way. They're optimizing to ensure token usage. I.E. LLMs have been fine-tuned to default to verbose responses. They are impressive to less experienced developers, often easier to detect certain types of errors (eg. Improper typing), and will make you use more tokens.

squeaky-clean 1 hour ago||

So gambling is fine as long as I'm doing it outside. Poker in a casino? Bad. Poker in a foresty meadow, good. Got it.

mikkupikku 55 minutes ago||

Basically true tbqh. Poker is maybe the one exception, but you're almost always better off gambling "in the wild" e.g. poker night with your buds instead of playing slots or anything else where "the house" is always winning in the long run. Are your losses still circulating in your local community, or have they been siphoned off by shareholders on the other side of the world? Gambling with friends is just swapping money back and forth, but going to a casino might as well be lighting the money on fire.

bonoboTP 1 hour ago||||

And that's only bad if it's illusory or fake. This reaction evolved because it's adaptive. In slot machines the brain is tricked to believe there is some strategy or method to crack and the reward signals make the addict feel there is some kind of progress being made in return to some kind of effort.

The variability in eg soccer kicks or basketball throws is also there but clearly there is a skill element and a potential for progress. Same with many other activities. Coding with LLMs is not so different. There are clearly ways you can do it better and it's not pure randomness.

pixl97 2 hours ago|||

>Intermittent variable rewards,

So you're saying businesses shouldn't hire people either?

scuff3d 1 hour ago|||

Right. A platform who makes money the more you have to use it is definitely optimizing to get you the right answer in as few tokens as possible.

There is absolutely no incentive to do that, for any of these companies. The incentive is to make the model just bad enough you keep coming back, but not so bad you go to a competitor.

We've already seen this play out. We know Google made their search results worse to drive up and revenue. Exact same incentives are at play here, only worse.

ctoth 56 minutes ago||

Please go read how the Anthropic max plan works.

IF I USE LESS TOKENS, ANTHROPIC GETS MORE MONEY! You are blindly pattern matching to "corporation bad!" without actually considering the underlying structure of the situation. I believe there's a phrase for this to do with probabilistic avians?

mrbungie 2 hours ago|||

> What's Anthropic's optimization target??? Getting you the right answer as fast as possible!

Are you totally sure they are not measuring/optimizing engagement metrics? Because at least I can bet OpenAI is doing that with every product they have to offer.

Aurornis 1 hour ago|||

> The gambling analogy completely falls apart on inspection.

The analogy was too strained to make sense.

Despite being framed as a helpful plea to gambling addicts, I think it’s clear this post was actually targeted at an anti-LLM audience. It’s supposed to make the reader feel good for choosing not to use them by portraying LLM users as poor gambling addicts.

mossTechnician 1 hour ago||

At one point, people said Google's optimization target was giving you the right search results as soon as possible. What will prevent Anthropic from falling into the same pattern of enshittification as its predecessors, optimizing for profit like all other businesses?

mikkupikku 1 hour ago||

I stopped using Google years ago because they stopped trying to provide good search results. If Anthropic stops trying to provide a good coding agent, I'll stop using them too.

BoxFour 2 hours ago||

I wish the author had stuck to the salient point about work/life balance instead of drifting into the gambling tangent, because the core message is actually more unsettling. With the tech job market being rough and AI tools making it so frictionless to produce real output, the line between work time and personal time is basically disappearing.

To the bluesky poster's point: Pulling out a laptop at a party feels awkward for most; pulling out your phone to respond to claude barely registers. That’s what makes it dangerous: It's so easy to feel some sense of progress now. Even when you’re tired and burned out, you can still make progress by just sending off a quick message. The quality will, of course, slip over time; but far less than it did previously.

Add in a weak labor market and people feel pressure to stay working all the time. Partly because everyone else is (and nobody wants to be at the bottom of the stack ranking), and partly because it’s easier than ever to avoid hitting a wall by just "one more message". Steve Yegge's point about AI vampires rings true to me: A lot of coworkers I’ve talked to feel burned out after just a few months of going hard with AI tools. Those same people are the ones working nights and weekends because "I can just have a back-and-forth with Claude while I'm watching a show now".

The likely result is the usual pattern for increases in labor productivity. People who can’t keep up get pushed out, people who can keep up stay stuck grinding, and companies get to claim the increase in productivity while reducing expenses. Steve's suggestion for shorter workdays sound nice in theory, but I would bet significant amounts of money the 40-hour work week remains the standard for a long time to come.

nharada 1 hour ago||

Another interesting thing here is that the gap between "burned out but just producing subpar work" and "so crispy I literally cannot work" is even wider with AI. The bar for just firing off prompts is low, but the mental effort required to know the right prompts to ask and then validate is much higher so you just skip that part. You can work for months doing terrible work and then eventually the entire codebase collapses.

Aurornis 59 minutes ago||

> With the tech job market being rough and AI tools making it so frictionless to produce real output, the line between work time and personal time is basically disappearing.

This isn't generally true at all. The "all tech companies are going to 996" meme comes up a lot here but all of the links and anecdotes go back to the same few sources.

It is very true that the tech job market is competitive again after the post-COVID period where virtually nobody was getting fired and jobs were easy to find.

I do not think it's true that the median or even 90th percentile tech job is becoming so overbearing that personal time is disappearing. If you're at a job where they're trying to normalize overwork as something everyone is doing, they're just lying to you to extract more work.

BoxFour 36 minutes ago||

It would never show up as some explicit rule or document. It just sort of happens when a few things line up: execs start off-handedly praising 996, stack ranking is still a thing, and the job market is bad enough that getting fired feels genuinely dangerous.

It starts with people who feel they’ve got more to lose (like those supporting a family) working extra to avoid looking like a low performer, whether that fear is reasonable or not. People aren’t perfectly rational, and job-loss anxiety makes them push harder than they otherwise would. Especially now, when "pushing harder" might just mean sending chat messages to claude during your personal time.

Totally anecdotal (strike 1), and I'm at a FAANG which is definitely not the median tech job (strike 2), but it’s become pretty normal for me to come back Monday to a pile of messages sent by peers over the weekend. A couple years ago even that was extremely unusual; even if people were working on the weekend they at least kept up a facade that they weren't.

simonw 2 hours ago||

I know it's popular comparing coding agents to slot machines right now, but the comparison doesn't entirely hold for me.

It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.

(I saw "no actual evidence pointing to these improvements" with a footnote and didn't even need to click that footnote to know it was the METR thing. I wish AI holdouts would find a few more studies.)

Steve Yegge of all people published something the other day that has similar conclusions to this piece - that the productivity boost for coding agents can lead to burnout, especially if companies use it to drive their employees to work in unsustainable ways: https://steve-yegge.medium.com/the-ai-vampire-eda6e4f07163

saulpw 1 hour ago||

Yeah I'm finding that there's "clock time" (hours) and "calendar time" (days/weeks/months) and pushing people to work 'more' is based on the fallacy that our productivity is based on clock time (like it is in a factory pumping out widgets) rather than calendar time (like it is in art and other creative endeavors). I'm finding that even if the LLM can crank out my requested code in an hour, I'll still need a few days to process how it feels to use. The temptation is to pull the lever 10 times in a row because it was so easy, but now I'll need a few weeks to process the changes as a human. This is just for my own personal projects, and it makes sense that the business incentives would be even more intense. But you can't get around the fact that, no matter how brilliant your software or interface, customers are not going to start paying in a few hours.

simonw 1 hour ago||

> The temptation is to pull the lever 10 times in a row because it was so easy, but now I'll need a few weeks to process the changes as a human

Yeah I really feel that!

I recently learned the term "cognitive debt" for this from https://margaretstorey.com/blog/2026/02/09/cognitive-debt/ and I think it's a great way to capture this effect.

I can churn out features faster, but that means I don't get time to fully absorb each feature and think through its consequences and relationships to other existing or future features.

mrbungie 2 hours ago|||

If you are really good and fast validating/fixing code output or you are actually not validating it more than just making sure it runs (no judging), I can see it paying out 95% of the time.

But for what I've seen both validating my and others coding agents outputs I'd estimate a much lower percentage (Data Engineering/Science work). And, oh boy, some colleages are hooked to generating no matter the quality. Workslop is a very real phenomenon.

biophysboy 2 hours ago||

This matches my experience using LLMs for science. Out of curiosity, I downloaded a randomized study and the CONSORT checklist, and asked Claude code to do a review using the checklist.

I was really impressed with how it parsed the structured checklist. I was not at all impressed by how it digested the paper. Lots of disguised errors.

baq 1 hour ago||

try codex 5.3. it's dry and very obviously AI; if you allow a bit of anthropomorphisation, it's kind of high-functioning autistic. it isn't an oracle, it'll still be wrong, but it's a powerful, completely different from claude tool.

biophysboy 1 hour ago||

Does it get numbers right? One of the mistakes it made in reading the paper was swapping sets of numbers from the primary/secondary outcomes.

baq 1 hour ago||

it does get screenshots right for me, but obviously I haven't tried on your specific paper. I can only recommend trying it out, it's also has a much more generous limits in the $20 tier than opus.

biophysboy 58 minutes ago||

I see. To clarify, it parsed numbers in the pdf correct, but assigned them the wrong meaning. I was wondering if codex is better at interpreting non text data

r00tanon 1 hour ago|||

I was going to mention Yegge's recent blog posts mirroring this phenomena.

There's also this article on hbr.org https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies...

This is a real thing, and it looks like classic addiction.

Retr0id 2 hours ago|||

It's 95% if you're using it for the stuff it's good at. People inevitably try to push it further than that (which is only natural!), and if you're operating at/beyond the capability frontier then the success rate eventually drops.

fdefitte 2 hours ago|||

That 95% payout only works if you already know what good looks like. The sketchy part is when you can't tell the diff between correct and almost-correct. That's where stuff goes sideways.

Kiro 2 hours ago|||

Just need to point out that the payout is often above 95% at online casinos. As long as it's below 100 the house still wins.

mikkupikku 1 hour ago||

He means a slot machine that pays you 95% of the time, not a slot machine that pays out 95% of what you put in.

Claude Code wasting my time with nonsense output one in twenty times seems roughly correct. The rest of the time it's hitting jackpots.

fy20 1 hour ago|||

> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it

Right but the <100% chance is actually why slot machines are addictive. If it pays out continuously the behaviour does not persist as long. It's called the partial reinforcement extinction effect.

jrflowers 2 hours ago||

> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.

“It’s not like a slot machine, it’s like… a slot machine… that I feel good using”

That aside if a slot machine is doing your job correctly 95% of the time it seems like either you aren’t noticing when it’s doing your job poorly or you’ve shifted the way that you work to only allow yourself to do work that the slot machine is good at.

symfrog 2 hours ago||

If you are trying to build something well represented in the training data, you could get a usable prototype.

If you are unfamiliar with the various ways that naive code would fail in production, you could be fooled into thinking generated code is all you need.

If you try to hold the hand of the coding agents to bring code to a point where it is production ready, be prepared for a frustrating cycle of models responding with ‘Fixed it!’ while only having introduced further issues.

Shank 2 hours ago||

I think that in a world where code has zero marginal cost (or close to zero, for the right companies), we need to be incredibly cognizant of the fact that more code is not more profit, nor is it better code. Simpler is still better, and products with taste omit features that detract from vision. You can scaffold thousands of lines of code very easily, but this makes your codebase hard to reason about, maintain, and work in. It is like unleashing a horde of mid-level engineers with spec documents and coming back in a week with everything refactored wrong. Sure you have some new buttons but does anyone (or can any AI agent, for that matter) understand how it works?

And to another point: work life balance is a huge challenge. Burnout happens in all departments, not just engineering. Managers can get burnout just as easily. If you manage AI agents, you'll just get burnout from that too.

dcre 2 hours ago||

How are we still citing the (excellent) METR study in support of conclusions about productivity that its authors rightly insist[0] it does not support?

My paraphrase of their caveats:

- experts on their own open source proj are not representative of most software dev

- measuring time undervalues trading time for effort

- tools are noticeably better than they were a year ago when the study was conducted

- it really does take months of use to get the hang of it (or did then, less so now)

Before you respond to these points, please look at the full study’s treatment of the caveats! It’s fantastic, and it’s clear almost no one citing the study actually read it.

[0]: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

Aurornis 2 hours ago||

After actually using LLM coding tools for a while, some of these anti-LLM thinkpieces feel very contrived. I don’t see the comparison to gambling addiction at all. I understand how someone might believe that if they only view LLMs through second hand Twitter hot takes and think that it’s a process of typing a prompt and hoping for the best. Some people do that, but the really effective coders work with the LLM and drive the coding, writing some or much of the code themselves. The social media version of vibe coding where you just prompt continuously and hope for the best is not going to work in any serious endeavor where details matter. We see claims of it in some high profile examples like OpenClaw, but even OpenClaw has maintainers and contributors who look at the code and make decisions. It’s also riddled with security problems as a result of the YOLO coding style.

shaokind 2 hours ago||

One of my recent thoughts is that Claude Code has become the most successful agent partially because it is more of a black box than previous implementations of the agent pattern: the actual code changes aren't shoved in your face like Cursor (used to be), they are hidden away. You focus more on the result rather than the code building up that result, and so you get into the "just one more feature" mindset a lot more, because you're never concerned that the code you're building is sloppy.

mikkupikku 1 hour ago|

It's because claude code will test its work and adjust the code accordingly. The old way, with the way Cursor used to be or the way I used to copy and paste code from ChatGPT doesn't work because to iterate in towards a working solution requires too much human effort, making the whole thing pointless.

shaokind 1 hour ago||

Cursor & its various clones (Cline, Roo Cline/Code) did that too, before Claude Code was even released.

htfu 2 hours ago|

Probably the best we can hope for at the moment is a reduction in the back-and-forth, increase in ability to one-shot stuff with a really good spec. The regular human work then becomes building that spec, in regular human (albeit AI-assisted) ways.

ErroneousBosh 2 hours ago|

Is the "back and forth" thing normal for AI stuff, then? Because every time I've attempted to use Claude or Copilot for coding stuff, it's been completely unable to do anything on its own, and I've ended up writing all of the code while it's just kind of introduced misspellings into it.

Maybe someone can show me how you're supposed to do it, because I have seen no evidence that AI can write code at all.

jazzyjackson 2 hours ago|||

Step 1: deposit money into an Anthropic API account

Step 2: download Zed and paste in your API Key

Step 3: Give detailed instructions to the assistant, including writing ReadMe files on the goal of the project and the current state of the project

Step 4: stop the robot when it's making a dumb decision

Step 5: keep an eye on context size and start a new conversation every time you're half full. The more stuff in the context the dumber it gets.

I spent about 500 dollars and 16 hours of conversation to get an MVP static marketplace [0], a ruby app that can be crawled into static (and js-free!) files, without writing a single line of code myself, because I don't know ruby. This included a rather convoluted data import process, loading the database from XML files of a couple different schemas.

Only thing I had to figure out on my own was how to upload the 140,000 pages to cloudflare free tier.

[0] https://motorcycledealer.com/

ErroneousBosh 1 hour ago|||

> Step 4: stop the robot when it's making a dumb decision

Yeah I can't stop myself when I'm about to make a dumb decision, just look at my github repo. I ported Forth to a 1980s sampler and wrote DSP code on an 8-bit Arduino.

How am I going to stop a robot making dumb decisions?

Also, this all sounds like I'm doing a lot of skivvy work typing stuff in (which I hate) and not actually writing much code (which is the bit I like).

hyperadvanced 10 minutes ago||

The robot will output text like “Oh, I see, the user wants me to make a Lovecraftian horror with asynchronous subprocess calls instead of HTTP endpoints, so I better suggest we reinstall the dependencies that are already installed so we can sacrifice this project to Mammoth”

It is at this point where you can say “NONONO YOU ABSOLUTE DONKEY stop that we just want a FastAPI endpoint!!” And it will go “You’re absolutely right, I was over complicating this!”

verdverm 1 hour ago|||

Step 1 is where Anthropic lost me.

1. If you don't use it soon enough, they keep it (shame on them, do the things you need to in order to be a money transmitter, you have billions of dollars)

2. Pay-go with billing warning and limits. You can use Claude like this through Google VertexAI

htfu 2 hours ago||||

Very much normal yes. This is why I've been (so far) still mainly sticking to having it as an all-knowing oracle telling me what I need to know, which it mostly does successfully.

When it works for pure generation it's beautiful, when it doesn't it's ruinous enough to make me take two steps back. I'll have another go at getting with all the pure agentic rage everyone's talking about soon enough.

8note 1 hour ago|||

there's a lot of back and forth for describing what you actually want, design, the constraints, and testing out the feeback loops you set up for it to be able to tell tell if its on the right track or not.

when its actually writing code its pretty hands off, unless you need to course correct to point it in a better direction

More comments...