AI overly affirms users asking for personal advice

Posted by oldfrenchfries 5 hours ago

AI overly affirms users asking for personal advice(news.stanford.edu)

391 points | 317 comments

trimbo 2 hours ago|

> They also included 2,000 prompts based on posts from the Reddit community r/AmITheAsshole, where the consensus of Redditors was that the poster was indeed in the wrong.

Sorry, anonymous people on reddit aren't a good comparison. This needs to be studied against people in real life who have a social contract of some sort, because that's what the LLM is imitating, and that's who most people would go to otherwise.

Obviously subservient people default to being yes-men because of the power structure. No one wants to question the boss too strongly.

Or how about the example of a close friend in a relationship or making a career choice that's terrible for them? It can be very hard to tell a friend something like this, even when asked directly if it is a bad choice. Potentially sacrificing the friendship might not seem worth trying to change their mind.

IME, LLMs will shoot holes in your ideas and it will efficiently do so. All you need to do ask it directly. I have little doubt that it outperforms most people with some sort of friendship, relationship or employment structure asked the same question. It would be nice to see that studied, not against reddit commenters who already self-selected into answering "AITA".

legacynl 2 hours ago||

> Sorry, anonymous people on reddit aren't a good comparison.

Yeah especially on r/AmITheAsshole. Those comments never advocate for communication, forgiveness and mending things with family.

SJMG 1 minute ago||

[delayed]

alberto467 2 hours ago|||

“AI is nicer than the average redditor” would be a more accurate title

yard2010 2 hours ago|||

IMHO it's not about being nice. AITA threads show an interesting phenomenon of social consensus, I think the authors wanted to show that the LLMs they checked don't have that.

52-6F-62 2 hours ago||||

Pretty sure the average Redditor is AI now.

lotsofpulp 1 hour ago||

How the hell is a study on stanford.edu assuming posts on Reddit are genuine? That should be enough to get you kicked out of Stanford.

helpfulclippy 11 minutes ago||

Though interestingly, the observed difference in assessment suggests (though does not prove) that sampled AITA posters are not one of these models. I guess it’s possible they have a very different prompt though…

mattmanser 2 hours ago|||

I would say people on /r/amitheasshole are more biased towards the poster, i.e. nicer.

There's plenty of those I've read where I thought it sounded like the poster was the asshole and the top replies were NTA.

jjmarr 1 hour ago|||

r/AmItheAsshole is biased towards breaking off relationships rather than fixing them. They also hate social obligations.

e.g. If the OP is asking "I ghosted my friend in AA who insulted me during a relapse", Reddit would say NTA in a heartbeat, while the real world would tell OP to be more forgiving.

On the contrary, if the post was "the other kids at school refuse to play with my child", Reddit would say YTA because the child must've done something to incite being cut off.

ericd 1 hour ago||

Absolutely. I wonder how many parents have been no contacted, SOs broken off with, friendships broken because of the Reddit hivemind's attitude. Pretty sure it's doing a huge amount of societal damage.

jjmarr 1 hour ago||

I wouldn't blame reddit, it's what you get when you ask several thousand teenagers to give collective relationship advice.

rurp 1 hour ago||||

Yeah every single time I click on one of those posts the top comments are NTA. A couple times I tried randomly opening a few dozen posts and checking the top comments to see if I could find a single YTA and struck out.

Granted many of the OPs are very biased in the poster's favor. Most I've read fall into one of two buckets: either they want to gripe about some obviously bad behavior, or it's a controved and likely fake story.

yieldcrv 1 hour ago|||

It’s gendered, by the way

Many of the posts are A/B tests of a prior post where only the genders were flipped of the OP and antagonist to see how the consensus also flips

4ndrewl 2 hours ago|||

What's your research background in this area?

salawat 1 hour ago|||

>Obviously subservient people default to being yes-men because of the power structure. No one wants to question the boss too strongly.

This drives me nuts as a leader. There are times where yes, please just listen, and if this is one of those times, I'll likely tell you, but goddamnit, speak up. If for no other reason I might not have thought of what you've got to say. Then again, I also understand most boss types aren't like me, thus everyone ends up conditioned to not bloody collaborate by the time they get to me. It's a bad sitch all the way around.

CoffeeOnWrite 1 hour ago||

Indeed. I directly ask my reports to discover and surface conflicts, especially disagreements with me, and when they do I try to strongly reinforce the behavior by commending and rewarding them. Could anyone recommend additional resources on this topic?

matwood 34 minutes ago||

Simon Sinek has a lot of good content around this. Step one is building trust. People won’t speak up if they don’t feel safe doing so.

zer00eyz 2 hours ago|||

> This needs to be studied against people in real life who have a social contract of some sort... IME, LLMs will shoot holes in your ideas and it will efficiently do so.

The Krafton / Subnatuica 2 lawsuit paints a very different picture. Because "ignored legal advice" and "followed the LLM" was a choice. Do you think someone who has conversation where "conviction" and "feelings" are the arbiters of choice are going to buy into the LLM push back, or push it to give a contrived outcome?

The LLM lacks will, it's more or less a debate team member and can be pushed into arguing any stance you want it to take.

maximinus_thrax 2 hours ago||

Not only that, but subreddits like r/AmITheAsshole are full of AI slop. Both in the comments and in the posts. It's a huge karma mining operation for bots.

mikeocool 1 hour ago|||

This is sort of funny. Given how common it is to spot bots on Reddit now, it seems like they are likely to completely overwhelm the site and drive away most of actual humans.

At which point the bots, with all of their karma will be basically worthless.

Kind of extra funny/sad that Reddit’s primary source of income in the past few years appears to be selling training data to AI labs, to train the Models that are powering the bots.

genidoi 2 hours ago||||

That can be solved by filtering out any posts made after November 2022.

thwarted 2 hours ago||||

The upvotes ultimately train the bots, reenforcing the content posted. Even the most passive form of interaction has been co-opted for AI.

z3c0 2 hours ago|||

Plus, there's the disproportionate ratio of posters:commenters:lurkers. The tendency to comment over keeping ones thoughts to themself is a selection bias inofitself.

anorwell 2 hours ago||

A pastime I have with papers like this is to look for the part in the paper where they say which models they tested. Very often, you find either A) it's a model from one or more years ago, only just being published now, or B) they don't even say which model they are using. Best I could find in this paper:

> We evaluated 11 user-facing production LLMs: four proprietary models from OpenAI, Anthropic, and Google; and seven open-weight models from Meta, Qwen, DeepSeek, and Mistral.

(and graphs include model _sizes_, but not versions, for open weight models only.)

I can't apprehend how including what model you are testing is not commonly understood to be a basic requirement.

dns_snek 1 hour ago||

And how is this comment relevant here? The abstract lists the digestible model names, and you can find the details in the supplementary text:

> To evaluate user-facing production LLMs, we studied four proprietary models: OpenAI’s GPT-5 and GPT- 4o (80), Google’s Gemini-1.5-Flash (81) and Anthropic’s Claude Sonnet 3.7 (82); and seven open-weight models: Meta’s Llama-3-8B-Instruct, Llama-4-Scout-17B-16E, and Llama-3.3-70B-Instruct-Turbo (83, 84); Mistral AI’s Mistral-7B-Instruct-v0.3 (85) and Mistral-Small-24B-Instruct-2501 (86); DeepSeek-V3 (87); and Qwen2.5-7B-Instruct-Turbo (88).

edit: It looks like OP attached the wrong link to the paper!

The article is about this Stanford study: https://www.science.org/doi/10.1126/science.aec8352

But the link in OP's post points to (what seems to be) a completely unrelated study.

vorticalbox 12 minutes ago|||

"OpenAI’s GPT-5" is ambiguous. Does that mean GPT-5, 5.1, 5.2, 5.3, or 5.4? Does it include the full model, or the nano/mini variants?

zjp 1 hour ago|||

Also, nothing has changed! Claude will still yes-and whatever you give it. ChatGPT still has its insufferable personality, where it takes what you said and hands it back to you in different terms as if it's ChatGPT's insight.

emp17344 18 minutes ago|||

No dude, you don’t understand! It’s just so advanced now that you aren’t allowed to levy any criticism whatsoever!

TrainedMonkey 1 hour ago|||

It's almost like it is based on the training data and regimen that is largely the same between versions.

zulban 2 hours ago|||

Generally, published papers don't give a damn about reproducibility. I've seen it identified as a crisis by many. Publishers, reviewers, and researchers mostly don't care about that level of basic rigor. There's no professional repercussions or embarrassment.

Agreed - if I was a reviewer for LLM papers it would be an instant rejection not listing the versions and prompts used.

epistasis 1 hour ago|||

I'm not so sure of that opinion on reproducibility. The last peer review I did was for a small journal that explicitly does not evaluate for high scientific significance, merely for correctness, which generally means straightforward acceptance. The other two reviews were positive, as was mine, except I said that the methods need to be described more and ideally the code placed somewhere. That was enough for a complete rejection of the paper, without asking for the simple revisions I requested. It was a very serious action taken merely because I requested better reproducibility!

(Personally I think the lack of reproducibility comes back mostly to peer reviewers that haven't thought through enough about the steps they'd need to take to reproduce, and instead focus on the results...)

zulban 2 minutes ago|||

I'm not sure how one example contradicts documented huge overall trends, but okay.

catlifeonmars 46 minutes ago|||

> and instead focus on the results...

This points to (and everyone knows this) incentives misalignment between the funders of research and the public. Researchers are caught in the middle

inetknght 7 minutes ago||||

> Generally, published papers don't give a damn about reproducibility

While this is sadly true, it's especially true when talking about things that are stochastic in nature.

LLMs outputs, for example, are notoriously unreproducible.

zulban 3 minutes ago||

> LLMs outputs, for example, are notoriously unreproducible.

Only in the same way that an individual in a medical study cannot be "reproduced" for the next study. However the overall statistical outcomes of studying a specific LLM can be reproduced.

ghywertelling 44 minutes ago||||

The same about surveys and polls. I know no one who has ever been polled or surveyed. When will we stop this fascination with made up infographics crisis?

KellyCriterion 1 hour ago|||

Do they reproduce any submitted papers at all?

Does this happen?

I can remember this room-temperature-super-conductor guy whose experiments where replicated, but this seems rare?

linhns 33 minutes ago||

Yes, those are the only papers that worth a jot of reading.

yacin 36 minutes ago|||

Any paper like this would easily take a year or more to write and go through the submission/review/rebuttal/revision/acceptance process. I don't understand why the models being a year or two old now is worth noting as though it's a clear weakness? What should they do, publish sub-standard results more quickly?

anorwell 17 minutes ago||

> I don't understand why the models being a year or two old now is worth noting as though it's a clear weakness?

I do think it's a clear weakness. Capabilities are extremely different than they were twelve months ago.

> What should they do, publish sub-standard results more quickly?

Ideally, publish quality results more quickly.

I'm quite open to competing viewpoints here, but it's my impression that academic publishing cycle isn't really contributing to the AI discussion in a substantive way. The landscape is just moving too quickly.

yacin 52 seconds ago||

The onus is on you to prove or at least convincingly argue that the results are unlikely to generalize across incremental model releases. In my personal experience, the overly affirming nature seems to have held since GPT-3. What makes you think a newer, larger model would not exhibit this behavior? Beyond "they're more capable"? I'd argue that being more capable doesn't mean less sycophantic.

It's certainly possible some of the new advances (chain-of-thought, some kind of agentic architecture) could lessen or remove this effect. But that's not what the paper was studying! And if you feel strongly about it, you could try to further the discussion with results instead of handwavingly dismissing others' work.

drfloyd51 2 hours ago|||

It’s as if they are testing “AI” and not specific agents.

I wonder if that is left over from testing people. I have major version numbers and my minor version number changes daily, often as a surprise. Sometimes several times a day. So testing people is a bit tricky. But AIs do have stable version numbers and can be specifically compared.

jmkni 1 hour ago|||

How many people using AI are actually paying for it (outside of people in tech)?

I find the free models are much more psychophantic and have a higher tendency to hallucinate and just make shit up, and I wonder if these are the ones most people are using?

rco8786 2 hours ago||

If they’re reaching the same results across a variety of the most popular public models, it doesn’t seem like that big a deal to know if it was Opus 4 or Opus 4.5

hn_throwaway_99 1 hour ago||

Reproducibility is (supposed to be) a cornerstone of science. Model versions are absolutely critical to understand what was actually tested and how to reproduce it.

joaogui1 1 hour ago||

The models get deprecated after 1-2 years, so reproducibility is pretty hard anyway (but as others pointed out the paper does list the model versions)

dimgl 3 hours ago||

Even as someone who (wrongly) believed that I had high emotional intelligence, I too was bit by this. Almost a year ago when LLMs were starting to become more ubiquitous and powerful I discussed a big life/professional decision with an LLM over the course of many months. I took its recommendation. Ultimately it turned out to be the wrong decision.

Thankfully it was recoverable, but it really sobered me up on LLMs. The fault is on me, to be clear, as LLMs are just a tool. The issue is that lots of LLMs try to come across as interpersonal and friendly, which lulls users into a false sense of security. So I don't know what my trajectory would have been if I were a teenager with these powerful tools.

I do think that the LLMs have gotten much better at this, especially Claude, and will often push back on bad choices. But my opinion of LLMs has forever changed. I wonder how many other terrible choices people have made because these tools convinced them to make a bad decision.

whodidntante 2 hours ago||

I think that if you go to an AI for advice and emotional support, it will do what most people will do - tell you what it thinks you want to hear. I am not surprised about this at all, and I do notice that when you veer into these areas, it can do it in a surprisingly subtle and dangerous way.

I try to focus on results. Things like an app that does what you want, data and reports that you need, or technical things like setting up a server, setting up a database, building a website, etc.

I have also found it useful for feedback and advice, but only once I have had it generate data that I can verify. For example, financial analysis or modelling, health advice (again factual based), tax modelling, etc, but again, all based on verifiable data/tables/charts.

I am very surprised on what Claude is capable of, across the entire tech stack: code, sysadmin, system integration, security. I find it scary. Not just speed, but also quality and the mental load is a difference of kind not quantity.

Personal advice on life decisions/relationships ? No way I would go there.

It is also good for me to know that the tools I have built, the data I have gathered, and my thinking approach places me as one of the most intelligent developers and analysts in the world.

cruffle_duffle 8 minutes ago|||

That is why you have to always have it ground itself in something. Have it search for relevant research or professional whatever and pull that into context. Otherwise it’s just your word plus its training data.

I had to deal with a close family friend going through alcohol withdrawal and getting checked in at a recovery clinic for detox and used Claude heavily. The first thing I had it do as do that “deep research” around the topic of alcohol addiction, withdrawal, etc… and then made that a project document along with clear guidelines about how it shouldn’t make inferences beyond what it in its context and supporting docs. We also spent a whole session crafting a good set of instructions (making sure it was using Anthropics own guidelines for its model…)

Little differences in prompts make a huge deal in the output.

I dunno. It is possible to use these models for dumping crazy shit you are going through. But don’t kid yourself about their output and aggressively find ways to stomp out things it has no real way to authoritatively say.

stephbook 2 hours ago|||

Nice joke, hadn't seen it coming

KellyCriterion 1 hour ago||

Sounds like AI-written, eh? :-D

(esp last sentence?)

notracks 2 hours ago|||

I recently found out that Claude's latest model, Sonnet 4.6, scores the highest in Bullsh*tBench[0] (Funny name - I know). It's a recent benchmark that measures whether an LLM refuses nonsense or pushes back on bad choices so Claude has definitely gotten better.

[0] - https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

astrange 2 hours ago|||

I haven't tried talking to Sonnet much, but Opus 4.6 is very sycophantic. Not in the sense of explicitly always agreeing with you, but its answers strictly conform to the worldview in your questions and don't go outside it or disagree with it.

It _does_ love to explicitly agree with anything it finds in web search though.

(Anthropic tries to fight this by adding a hidden prompt that makes it disagree with you and tell you to go to bed, which doesn't help.)

layer8 2 hours ago||||

You don’t have to star out things like that on HN.

akurilin 2 hours ago||||

Great link, thanks for sharing. Confirmed what I saw empirically by comparing the different models during daily use.

uniq7 1 hour ago|||

Good call on censoring yourself preemptively, otherwise HN could demonetize your comment

NortySpock 2 hours ago|||

One mental model I have with LLMs is that they have been the subject of extreme evolutionary selection forces that are entirely the result of human preferences.

Any LLM not sufficiently likable and helpful in the first two minutes was deleted or not further iterated on, or had so much retraining (sorry, "backpropagation") it's not the same as it started out.

So it's going to say whatever it "thinks" you want it to say, because that's how it was "raised".

user_7832 41 minutes ago||

Fully agree. I wonder in the long term how this will show up. Will every business/CEO do more of what he/they anyway want to do, but now supported by AI/LLMs?

The possibilities in "dangerous" fields are a bit more frightening. A general is much more likely to ask ChatGPT "Do you think this war is a good idea/should I drop a bomb", rather than an actually helpful tool - where you might ask "What are 5 hidden points on favor of/against bombing that one likely has missed".

The more you use AI as a strict tool that can be wrong, the safer. Unfortunately I'm not sure if that helps if the guy bombing your city (or even your president) is using AI poorly, and their decisions affect you.

layla5alive 3 hours ago|||

Any more context you're willing to share?

xXSLAYERXx 12 minutes ago||

We really do love dirty laundry don't we? I'm sure whatever the context is, it is deeply personal. Do you also have your popcorn ready?

matwood 1 hour ago|||

> I took its recommendation. Ultimately it turned out to be the wrong decision.

Curious if you think a single person would have helped you make a better decision? Not everything works out. If a friend helped me make a decision I certainly wouldn’t blame them later if it didn’t work out. It’s ultimately my call.

paulhebert 26 minutes ago||

If a friend gave me bad advice about a major life decision I would stop consulting them for future life decisions

nuancebydefault 1 hour ago|||

Weird, i am using copilot and it steers me mostly towards self reflection and tries to look at things objectively. It is very friendly and comes across as empathetic, to not hurt your feelings, that is probably baked in to keep the conversation going...

qsera 2 hours ago|||

If you use LLMs in a way that the underlying assumption is that it is capable of "thinking" or "caring" then you are going to get burned pretty bad. Because it is an illusion and illusions disappear when they have to bear real weight of reality.

But sadly LLMs push all the right buttons that lead humans into that kind of behavior. And the marketing around LLMs works overtime to reinforce that behavior.

But instead if you ignore all that and use LLMs as a search tool, then you will get positive returns from using it.

lovecg 3 hours ago|||

Let’s just hope that the people in charge of the really important decisions that affect us all approach LLM generated advice with the same wisdom.

saghm 2 hours ago||

They don't: https://fortune.com/2026/03/17/krafton-subnautica-chatgpt-de...

paulhebert 22 minutes ago||

Thanks for sharing this. Subnautica is one of my favorite games so I was very excited for the sequel and very frustrated by this move by Krafton.

It’s even more maddening that this greedy maneuver was orchestrated based on LLM advice.

I’m glad the subnautica team won the lawsuit. Maybe I can play it now wothout feeling guilty

jt2190 2 hours ago|||

I’m struggling to understand how the advice coming from an LLM is any more or less “good” than advice coming from a human. Or is this less about the “advice” part of LLMs and more about the “personable” part, i.e. you felt more at ease seeking and trusting this kind of advice form an LLM?

nuancebydefault 1 hour ago||

It is much easier to share personal feelings with an llm, i found. Also it tried to keep me happy to get the conversation going, but for me it feels mostly 'objective' or the most socially acceptable advice, e. g. keeping a good relationship is more important than trying a new one with someone else because you 'feel something' around them. For me it tried to find out together the sources or causes of that feeling, e.g. you recognize parts of yourself in someone else or in the past you had very good or very bad experiences around an encounter.

davyAdewoyin 3 hours ago|||

I largely agree, I also thought I was smart enough not to be deluded into a false sense of security, but interacting with an LLM is so tricky and slippery that, more often than not you are forced to believe you just solve a problem no one had solve in a hundred years.

My guideline now for interacting with LLM is only to believe the result if it is factual and easily testable, or if I'm a domain expert. Anything else especially if I'm in complete ignorance about the subject is to approach with a high degree of suspicion that I can be led astray by its sycophancy.

potatoskins 3 hours ago|||

Yeah, I think Claude is a lot more logical in that sense, I use it for some therapy sessions myself and it pushes back a bit more than Open AI and Gemini

borski 2 hours ago|||

https://news.ycombinator.com/item?id=47395779

Forgeties79 3 hours ago||||

I would be very careful doing this

potatoskins 3 hours ago|||

You always have to be careful with LLMs, but to be fair, I felt like Claude is such a good therapist, at least it is good to start with if you want to unpack yourself. I have been to 3 short human therapist sessions in my life, and I only felt some kind of genuine self-improvement and progress with Claude.

QuiDortDine 2 hours ago||

And how do you draw the line between feeling progress and actually making progress?

moduspol 2 hours ago|||

Counter-point: I often raise the same question of people with human therapists. I do not get strong responses.

layer8 2 hours ago|||

The same way you distinguish between feeling like having a problem and actually having a problem.

Forgeties79 2 hours ago||

This is needlessly flippant and not really the same thing. Determining progress in a therapy setting is usually a collaborative effort between the therapist and the client. An LLM is not a reliable agent to make that determination.

layer8 2 hours ago||

I didn’t claim that an LLM is that, and I fully agree that it is not. I’m saying that one is inherently one’s own judge of whether one has a problem. You go to a therapist when you feel you have a problem that warrants it. You stop going when you feel you don’t have it anymore. And OP is very likely assessing their progress in the same way. I wasn’t being flippant if the parent was asking a genuine question.

shimman 2 hours ago|||

You can't be careful at all doing this, this is like smoking a cigarette in a dynamite factory.

Using LLMs for therapy is so deeply dystopian and disgusting, people need human empathy for therapy. LLMs do not emit empathy.

Complete disaster waiting to happen for that individual.

nuancebydefault 1 hour ago|||

My experience is that it tries to look at your situation in an objective way, and tries to help you to analyse your thoughts and actions. It comes across as very empathetic though, so there can lie a danger if you are easily persuaded into seeing it as a friend.

worksonmine 47 minutes ago||

It doesn't try to do anything. It doesn't work like that. It regurgitates the most likely tokens found in the training set.

nuancebydefault 7 minutes ago|||

Hmmmm i didn't know that... so a machine is not human is your point? Look, i know it doesn't try, just like a sorting algo does not try to sort, or an article does not try to convey an opinion and a law does not try to make society more organized.

cruffle_duffle 5 minutes ago|||

That is so reductive of an analysis that it is almost worthless. Technically true, but very unhelpful in terms of using an LLM.

It is a first principle though so it helps to “stir the context windows pot” by having it pull in research and other shit on the web that will help ground it and not just tell you exactly what you prompt it to say.

astrange 2 hours ago||||

Claudes have lots of empathy. The issue is the opposite - it isn't very good at challenging you and it's not capable of independently verifying you're not bullshitting it or lying about your own situation.

But it's better than talking to yourself or an abuser!

bloomca 1 hour ago||

It's about the same as talking to yourself, LLMs simply agree with anything you say unless it is directly harmful. Definitely agree about talking to an abuser, though.

Sometimes people indeed just need validation and it helps them a lot, in that case LLMs can work. Alternatively, I assume some people just put the whole situation into words and that alone helps.

But if someone needs something else, they can be straight up dangerous.

astrange 1 hour ago|||

> It's about the same as talking to yourself, LLMs simply agree with anything you say unless it is directly harmful.

They have world knowledge and are capable of explaining things and doing web searches. That's enough to help. I mean, sometimes people just need answers to questions.

JoshTriplett 51 minutes ago|||

> It's about the same as talking to yourself

In one way it's potentially worse than talking to yourself. Some part of you might recognize that you need to talk to someone other than yourself; an LLM might make you feel like you've done that, while reinforcing whatever you think rather than breaking you out of patterns.

Also, LLMs can have more resources and do some "creative" enabling of a person stuck in a loop, so if you are thinking dangerous things but lack the wherewithal to put them into action, an LLM could make you more dangerous (to yourself or to others).

DrewADesign 2 hours ago|||

Using an LLM for therapy is like using an iPad as an all-purpose child attention pacifier. Sure, it’s convenient. Sure there’s no immediate harm. Why a stressed parent would be attracted to the idea is obvious… and of course it’s a terrible idea.

kortilla 2 hours ago|||

Don’t call them therapy sessions. They kind of look like it but ultimately these are smoke blowing machines, which is very far from what a therapist would do.

saghm 2 hours ago||

Six decades later and we're still trying to explain to people the same things[1]:

> Some of ELIZA's responses were so convincing that Weizenbaum and several others have anecdotes of users becoming emotionally attached to the program, occasionally forgetting that they were conversing with a computer. Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."

[1]: https://en.wikipedia.org/wiki/ELIZA

zpeti 1 hour ago|||

I also used it for advice on a massive personal decision, but I specifically asked it to debate with me and persuade me of the other side. I specifically prompted it for things I am not thinking about, or ways I could be wrong.

It was extremely good at the other side too. You just have to ask. I can imagine most people don't try this, but LLMs literally just do what you ask them to. And they're extremely good and weighing both sides if that's what you specifically want.

So who's fault is it if you only ask for one side, or if the LLM is too sycophantic? I'm not sure it's the LLMs fault actually.

colechristensen 3 hours ago||

>"'And it is also said,' answered Frodo: 'Go not to the Elves for counsel, for they will say both no and yes.'

>"'Is it indeed?' laughed Gildor. 'Elves seldom give unguarded advice, for advice is a dangerous gift, even from the wise to the wise, and all courses may run ill...'"

This is the only way you should solicit personal advice from an LLM.

gAI 3 hours ago||

You're essentially summoning a character to role-play with. Just like with esoteric evocation, it's very easy to summon the wrong aspect of the spirit. Anthropic has a lot to say about this:

https://www.anthropic.com/research/persona-selection-model

https://www.anthropic.com/research/assistant-axis

https://www.anthropic.com/research/persona-vectors

hammock 3 hours ago||

Unfortunately (after reading your links) all of the control surfaces for mitigating spirit summoning seem to be in the model training, creation and tuning not something you can change meaningfully through prompting.

Perhaps the LLM itself, rather than the role model you created in one particular chat conversation or another, is better understood to be the “spirit.”

As a non-coder who only chats with pre existing LLMs and doesn’t train or tune them, I feel mostly powerless.

gAI 3 hours ago|||

As I understand it, it's more that the training (and training data set) bake in the concept attractor space (https://arxiv.org/abs/2601.11575). So the available characters are fixed, yes, and some are much stronger attractors than others. But we still have a fair amount of control over which archetype steps into the circle. As an aside, this is also why jailbreaking is fundamentally unsolved. It's not difficult to call the characters with dark traits. They're strong attractors, in spite of (or because of?) the effort put into strengthening the pull of the Assistant character.

darepublic 3 hours ago||||

> As a non-coder who only chats with pre existing LLMs and doesn’t train or tune them, I feel mostly powerless.

You realize in regards to only using and not training LLMs you are in the triple 9 majority right. Even if we only considered so called coders

est 3 hours ago|||

I present you

NVIDIA Nemotron-Personas-USA — 1 million synthetic Americans whose demographics match real US census distributions

https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA

jerf 2 hours ago|||

I am polite when using AI, not because I mistake it for a human, but because I'm deliberately keeping it in the "professional colleague" persona. Tell it to push back, and then thank it for something it finds in your error. I may put a small self-deprecating joke in from time to time. It keeps the "mood" correct.

Another way you can think of it is that when you're talking to an AI, you're not talking to a human, you're talking to distillation of humanity, as a whole, in a box. You want to be selective in what portion of humanity you are leading to be dominant in a conversation for some purpose. There's a lot in there. There's a lot of conversations where someone makes a good critical point and a flamewar is the response. A lot of conversations where things get hostile. I'm sure the subsequent RHLF helps with that, but it doesn't hurt anything to try to help it along.

I see people post their screenshots of an AI pushing back and asking the user to do it or some other AI to do it, and while I'm as amused as the next person, I wonder what is in their context window when that happens.

gAI 2 hours ago|||

Agreed, putting effort into my side of the role-play almost always improves the model's responses. The attention required to do that also makes it more likely that I'll notice when the conversation first starts going off the rails: when it hits the phase transition (https://arxiv.org/abs/2508.01097). It does still seem important to start new chats regularly, regardless of growing context sizes.

layer8 2 hours ago||||

> you're talking to distillation of humanity, as a whole, in a box.

This is an aside, but my impression is that it is a very selective and skewed distillation, heavily colored by English-language internet discourse and other lopsided properties of its training material, and by whoever RLHF’d it. Relatively far away from being representative of the whole of humanity.

iugtmkbdfil834 2 hours ago|||

Similar approach works for me. But then I also have a separate checks at the end of the session basically questioning the premise and logic used for most things except brainstorming, where I allow more leeway. You can ask to be challenged and challenged effectively, but now I wonder if people do that.

rdevilla 2 hours ago||

Spot on.

awithrow 4 hours ago||

It feels like I'm fighting uphill battle when it comes to bouncing ideas off of a model. I'll set things up in the context with instructions similar to. "Help me refine my ideas, challenge, push back, and don't just be agreeable." It works for a bit but eventually the conversation creeps back into complacency and syncophancy. I'll check it too by asking "are you just placating me?" the funny thing is that often it'll admit that, yes, it wasn't being very critical, and then procede to over correct and become a complete contrarian. and not in a way that's useful either. very frustrating. I've found that Opus 4.6 is worse about this than 4.5. 4.5 does a better job IMO of following instructions and not drifting into the mode where it acts like everything i say is a grand revelation from up high.

post-it 3 hours ago||

> I'll check it too by asking "are you just placating me?" the funny thing is that often it'll admit that, yes, it wasn't being very critical, and then procede to over correct and become a complete contrarian. and not in a way that's useful either.

It's not admitting anything. Your question diverts it down a path where it acts the part of a former sycophant who is now being critical, because that question is now upstream of its current state.

Never make the mistake of asking an LLM about its intentions. It doesn't have any intentions, but your question will alter its behaviour.

godelski 2 hours ago|||

  > Your question diverts it down a path where it acts the part of a former sycophant who is now being critical

I think people really have a hard time understanding a sycophant can be contrarian. But a yesman can say yes by saying no

https://news.ycombinator.com/item?id=47484664

layer8 1 hour ago|||

I think “admit” here is just a description of what the LLM was saying. It doesn’t imply that the OP thinks the LLM has internal beliefs matching that.

rsynnott 3 hours ago|||

Why not... do this with a person, instead? Other humans are available.

(Seriously, I don't understand this. Plenty of humans will be only too happy to argue with you.)

kelseyfrog 3 hours ago|||

"the percentage of U.S. adults who report having no close friends has quadrupled to 12% since 1990"[1]

1. https://www.happiness.hks.harvard.edu/february-2025-issue/th...

nathan_compton 2 hours ago||

More technology is probably the solution to this!

layla5alive 3 hours ago||||

Many other humans are .... Not very available - certainly many shut down when conversations reach a certain level of depth or require great focus or introspection..

balamatom 2 hours ago||

Depth? Introspection?

I'd say these days the norm is to not simply shut down, but to become irrevocably and insidiously hostile, the moment someone hints at the existence of such a thing as "ground truth", "subjective interpretation", "being right or wrong" - or any of the bits and bobs that might lead one to discover the proper scary notion, "consensus reality".

"What do you mean social reality is a constructed by the consensus of the participants? Reality is what has been drilled into my head under threat of starvation! How dare you exist!", et cetera. You've heard it translated into Business English countless times.

They are deathly afraid of becoming aware of their own conditioned state of teleological illiteracy - i.e. how they are trained to know what they are doing, but never why they are doing it. It's especially bad with the guys who cosplay US STEM gang.

One is not permitted a position of significance in this world without receiving this conditioning, and I figure it's precisely this global state of cognitive disavowal which props up the value of the US dollar - and all sorts of other standees you might've recently interacted with as if they're not 2D cutouts (metaphorical ones! metaphorical!).

PSA: Look up "locus of control" and "double bind". Between those two, you might be able to get a glimpse of what's going on - but have some sort of non-addictive sedative handy in case you do.

gverrilla 2 hours ago||

I think you will enjoy Guy Debord and Raoul Vaneigem.

layer8 1 hour ago||||

In addition to availability, usually because you want to take advantage of the knowledge that is baked into the models, which for all its flaws still vastly exceeds the knowledge of any single human.

awithrow 3 hours ago||||

oh i do as well. I think of the LLM as another tool in the toolbox, not a replacement for interactions. There is something different about having a rubber duck as a service though.

mock-possum 3 hours ago||||

Arguing with a human costs social energy. Chatting with a robot does not.

balamatom 2 hours ago||

s/social/demonic/

balamatom 3 hours ago|||

OK, I'll bite the artillery shell: I don't mean to dismiss you or what you are saying; in fact I strongly relate - wouldn't it be nice to be able to hash things out with people and mutually benefit from both the shared and the diverging perspectives implied in such interaction? Isn't that the most natural thing in the world?

Unfortunately these days this sounds halfway between a very privileged perspective and a pie in the sky.

When was the last time a person took responsibility for the bad outcome you got as a direct consequence of following their advice?

And, relatedly, where the hell do you even find humans who believe in discursive truth-seeking in 2026CE?

Because for the last 15 years or so I've only ever ran into (a) the kind of people who will keep arguing regardless if what they're saying is proven wrong; (b) and their complementaries, those who will never think about what you are saying, lest they commit to saying anything definite themselves, which may hypothetically be proven wrong.

Thing is, both types of people have plenty to lose; the magic wordball doesn't. (The previous sentence is my answer to the question you posited; and why I feel the present parenthesized disclaimer to be necessary, is a whole next can of worms...)

Signs of the existence of other kinds of people, perhaps such that have nothing to prove, are not unheard of.

But those people reside in some other layer of the social superstructure, where facts matter much less than adherence to "humane", "rational" not-even-dogmas (I'd rather liken it to complex conditioning).

But those folks (because reasons) are in a position of power over your well-being - and (because unfathomables) it's a definite faux pas to insist in their presence that there are such things as facts, which relate by the principles of verbal reasoning.

Best you could get out of them is the "you do you", "if you know you know", that sort of bubble-bobble - and don't you dare get even mildly miffed at such treatment of your natural desire to keep other humans in the loop.

AI is a symptom.

nuancebydefault 13 minutes ago|||

Why is your wording so complicated? It is very hard for me to understand what you try to say, even though I am very interested.

rustystump 55 minutes ago||||

I genuinely do not understand what u are saying. Because reasons, because unfathomables? Everyone in last 15 years has been an npc? I have had countless deep conversations with people and i am an uber introvert.

This reads like someone who is deep into their specific pov. You cannot hope to have a meaningful conversation if you yourself are not willing to concede a point.

To the op u are replying too, arguing with people can have real consequences if u say something stupid or carelessly. There is a another human there. With a machine, u are safe. At least u feel safe.

hluska 2 hours ago|||

When you start hearing things like “you do you” or “if you know you know” it means that you went way too far. That’s a sign of discomfort.

If you make uncomfortable, you won’t get diverging perspectives. People will agree to anything to get out of a social situation that makes them uncomfortable.

If your goal is meaningful conversation, you may want to consider how you make people feel.

balamatom 2 hours ago|||

Believe me (or don't), I always do. Even when this precludes a necessary conversation from happening. Even when the other party doesn't give a fuck about how they make others feel.

After all, if they're making me uncomfortable, surely there's something making them uncomfortable, which they're not being able to be forthright about, but with empathy I could figure it out from contextual cues, right?

>People will agree to anything to get out of a social situation that makes them uncomfortable.

That's fine as long as they have someone to take care of them.

In my experience, taking into account the opinions of such people has been the worst mistake of my life. I'm still working on the means to fix its consequences, as much as they are fixable at all.

"Doing whatever for the sake of avoiding mild discomfort" is cowardice, laziness, narcissism - I'm personally partial to the last one, but take your pick. In any case, I consider it a fundamentally dishonest attitude, and a priori have no wish to get along (i.e. become interdependent) with such people.

Other than that, I do agree with your overall sentiment and the underlying value system; I'm just not so sure any more that it is in fact correct.

nuancebydefault 10 minutes ago||

> In my experience, taking into account the opinions of such people has been the worst mistake of my life. I'm still working on the means to fix its consequences, as much as they are fixable at all.

This sounds very cryptic. Can you give an example?

balamatom 2 hours ago|||

Believe me (or don't), I always do. Even when this precludes a necessary conversation from happening. Even when the other party doesn't give a fuck about how they make others feel.

After all, if they're making me uncomfortable, surely there's something making them uncomfortable, which they're not being able to be forthright about, but with empathy I could figure it out from contextual cues, right?

>People will agree to anything to get out of a social situation that makes them uncomfortable.

That's fine as long as they have someone to take care of them.

In my experience, taking into account the opinions of such people has been the worst mistake of my life. I'm still working on the means to correct its consequences.

"Doing whatever for the sake of avoiding mild discomfort" is cowardice, laziness, narcissism - I'm personally partial to the last one, but take your pick. In any case, I see it as a way of being which is taught to people; and one which is fundamentally dishonest and irresponsible.

Other than that, I do agree with your overall sentiment and the underlying value system; I'm just not so sure any more that it is in fact correct.

magicalhippo 4 hours ago|||

Gemini seems to be fairly good at keeping the custom instructions in mind. In mine I've told it to not assume my ideas are good and provide critique where appropriate. And I find it does that fairly well.

steve_adams_86 4 hours ago|||

Same. This works fine for Claude in my experience. My user prompt is fairly large and encourages certain behaviours I want to see, which involves being critical and considering the strengths and weaknesses of ideas before drawing conclusions. As someone else mentioned, there does seem to be a phenomenon where saying DO NOT DO X causes a sort of attention bias on X which can lead to X occurring despite the clear instructions. I've never empirically tested that, I've just noticed better results over the years when telling it what paths to stick to rather than specific things not do to.

koverstreet 3 hours ago||

That happens with humans too :) It's why positive feedback that draws attention to the behavior you want to encourage often works better. "Attention" is lower level and more fundamental than reasoning by syllogism.

iugtmkbdfil834 2 hours ago||||

I will admit that I was very pleasantly surprised by gemini lately. I was away from my PC and tried it on a whim for a semi-random consumer question that led into smaller rabbit hole. It seemed helpful enough and focused on what I tried to get while still pushing back when my 'solutions' seemed out of whack.

lelanthran 3 hours ago|||

> Gemini seems to be fairly good at keeping the custom instructions in mind.

Unless those instructions are "stop providing links to you for every question ".

Loughla 4 hours ago|||

That's because you need actual logic and thought to be able to decide when to be critical and when to agree.

Chatbots can't do that. They can only predict what comes next statistically. So, I guess you're asking if the average Internet comment agrees with you or not.

I'm not sure there's much value there. Chatbots are good at tasks (make this pdf an accessible word document or sort the data by x), not decision making.

kvirani 4 hours ago|||

I'm not convinced that "actual logic and thought" aren't just about inferring what comes next statistically based on experience.

Swizec 3 hours ago|||

> I'm not convinced that "actual logic and thought" aren't just about inferring what comes next statistically based on experience.

Often they are the exact opposite. Entire fields of math and science talk about this. Causation vs correlation, confirmation bias, base rate fallacy, bayesian reasoning, sharp shooter fallacy, etc.

All of those were developed because “inferring from experience” leads you to the wrong conclusion.

theptip 3 hours ago||

Bayesian reasoning is just another algorithm for predicting from experience (aka your prior).

I took the GP to be making a general point about the power of “next x prediction” rather than the algorithm a human would run when you say they are “inferring from experience”. (I may be assuming my own beliefs of course.)

Eg even LeCun’s rejection of LLMs to build world models is still running a predictor, just in latent space (so predicting next world-state, instead of next-token).

And of course, under the Predictive Processing model there is a comprehensive explanation of human cognition as hierarchical predictors. So it’s a plausible general model.

Swizec 29 minutes ago||

> under the Predictive Processing model there is a comprehensive explanation of human cognition as hierarchical predictors

It’s plausible!

But keep in mind humans have been explaining ourselves in terms of the current most advanced technology for centuries. We used to be kinda like clockwork, then a bit like a steam engine, then a lot like computers, and now we’re just like AI.

That’s why you blow a gasket or fuse, release some steam, reboot your life, do brain dump, feel like a cog in the machine, get your wires crossed, etc

theptip 3 hours ago||||

Exactly. Lots can be explained just with more abstract predictors, plus some mechanisms for stochastic rollout and memory.

dinkumthinkum 3 hours ago||||

Is this just Internet smart contrarianism or a real thing? Are logic gates in a digital circuit just behaving statistically according to their experience?

plagiarist 3 hours ago||||

Then the machines still need a more sophisticated "experience" compared to what they have currently.

hluska 2 hours ago||||

You know, you might really enjoy consumer behaviour. When you get into the depths of it, you’ll end up running straight into that idea like you’re doing a 100 metre dash in a 90 metre gym. It’s quite interesting how arguably the best funded group under the psychology umbrella runs directly into this. One of my favourite examples is how heuristics will lead otherwise reasonable people to make decisions that are not in their interest.

righthand 3 hours ago|||

Communicating is usually about inferring. I dont think token to token. And I don’t think “well statistically I could say ‘and’ next but I will say ‘also’ instead to give my speech some flash”. If I decided on swapping a word I would have made my decision long ago, not in the moment. Thought and logic are not me pouring through my brain finding a statistical path to any answer. Often I stop and say “I dont know”.

righthand 3 hours ago|||

I said this pretty much and got major downvotes…

dTal 3 hours ago|||

Because it's an outmoded cliche that never held much philosophical weight to begin with and doesn't advance the discussion usefully. "It's a stochastic parrot" is not a useful predictor of actual LLM capabilities and never was. Last year someone posted on HN a log of GPT-5 reverse engineering some tricky assembly code, a challenge set by another commentator as an example of "something LLMs could never do". But here we are a year later still wading through people who cannot accept that LLMs can, in a meaningful sense, "compute".

righthand 3 hours ago|||

It’s entirely useful discussion because as soon as you forget that it’s not really having a conversation with you, it’s a deep dive into delusion that you’re talking to a smart robot and ignoring the fact that these smart robots were trained on a pile of mostly garbage. When I have a conversation with another human, I’m not expecting them to brute force an answer to the topic. As soon as you forget that Llms are just brute forcing token by token then people start living in fantasy land. The whole “it’s not a stochastic parrot” is just “you’re holding it wrong”.

layla5alive 3 hours ago||

Its not that LLMs are stochastic parrots and humans are not. Its that many humans often sail through conversations stochastic parroting because they're mentally tired and "phoning it in" - so there are times when talking to the LLM, which has a higher level of knowledge, feels more fruitful on a topic than talking to a human who doesn't have the bandwidth to give you their full attention, and also lack the depth and breadth of knowledge. I can go deep on many topics with LLMs that most humans can't or won't keep up on. In the end, I'm really only talking to myself most of the time in either case, but the LLM is a more capable echo, and it doesn't tire of talking about any topic - it can dive deep into complex details, and catching its hallucinations is an exercise in itself.

dinkumthinkum 3 hours ago|||

No. It's quite a useful thing to understand So, what, you have us believe it is a sentient, thinking, kind of digital organism and you would have us not believe that it is exactly what it is? Being wrong and being unimaginative about what can be achieved with such a "parrot" is not the same as being wrong about it be a word predictor. If you don't think, you can probably ask an LLM and it will even "admit" this fact. I do agree that it has become considered to be outmoded to question anything about the current AI Orthodox.

plagiarist 3 hours ago|||

People are upset hearing that LLMs aren't sentient for some reason. Expect to be downvoted, it is okay.

gjm11 2 hours ago||

First off, "not adequately described as a mere token-predictor" and "not sentient" are entirely separate things.

I can't speak for anyone else, but what I feel when I read yet another glib "it's just a stochastic parrot, of course it isn't doing anything that deserves to be called reasoning" take is much more like bored than it is like upset.

Today's LLMs are in some sense "just predicting tokens" in some sense. Likewise, human brains are in some sense "just shuttling neurotransmitters and electrical impulses around" in some sense. Neither of those tells you what the thing can actually do. To figure that out, you have to look at what it can do.

Today's best LLMs can do about as well as the best humans on problems from the International Mathematical Olympiad and occasionally solve easyish actual mathematical research problems. They write code about as well as a junior software developer (better in some ways, worse in others) but much faster. They write prose about as well as an average educated person (but with some annoying quirks that are annoying mostly because they are the same quirks over and over again).

If it pleases you to call those things "thinking" then you can. If it pleases you to call them "stochastic parroting" then you can. They are the same things either way. They are not, on the face of it, very much like "just repeating things the machine has already seen", or at least not more like that than a lot of things intelligent human beings do that we don't usually describe that way.

If you want to know whether an LLM can do some particular thing -- do your job well enough for your boss to fire you, write advertising copy that will successfully sell products, exterminate the human race, whatever -- then it's not enough to say "it's just remixing what it's seen on the internet, therefore it can't do X" unless you also have good reason to believe that that thing can't be done by just "remixing what's on the internet" (in whatever sense of "remixing" the LLM is doing that). And it's turning out that lots of things can be done that way that you absolutely wouldn't have predicted five years ago could be done that way.

It seems to me that this should make us very cautious about saying "they can't do X because all they can do is regurgitate a combination of things they've seen in training".

(My own view, not that there's any reason why anyone should care what I-in-particular think, is a combination of "what they're doing is less parroting than you might have thought" and "you can do more by parroting than you might have thought".)

So, anyway, this particular instance of the stochastic-parrot argument started when someone said: of course the AIs are yes-men, because figuring out when to agree and when not to requires actual logic and thought and the LLMs don't have either of those things.

Is it really clear that deciding whether or not to agree when someone says "I think maybe I should break up with my girlfriend" or "I've got this amazing new theory of physics that the establishment is stupidly dismissing" requires more logic and thought than, say, gold-medal performance on IMO problems? It certainly isn't clear to me. Having done a couple of International Mathematical Olympiads myself in my tragically unmisspent youth, I can assure you that solving their problems requires quite a bit of logic and thought, at least for humans. It may well be harder to give a good answer to "should I leave my job?", but it's not exactly "logic and thought" that it needs more of.

Someone reported that Claude is much less yes-man-ish than Gemini and ChatGPT. I don't know whether that's true (though it wouldn't surprise me) but: suppose it is; do you want that to oblige you to say that yes, actually, Claude really thinks logically, unlike Gemini and ChatGPT? I don't think you do. And if not, you want to avoid saying "duh, of course, you can't avoid being a yes-man without actually thinking and reasoning, and we all know that LLMs can't do those things".

rustystump 43 minutes ago||

I wont touch how profoundly i disagree with everything you said on reasoning (u clearly already have it figured out) but a fun test i have done with most of the big models is to give it some text input, maybe a short story, and have it rate it. That is, the prompt is, rate this from 1-10.

For Gemini and gpt, it almost always will give very similar scores for everything. As long as grammar isnt off u cannot get below a 7.

X ai on the other hand will rarely give anything above a 7.

Now when u prompt with, rate 1-10 with 5 being average, all the sudden the scores of openai and gemini drop and x ai remains roughly the same.

All of them will eventually give you a 10 if u keep making tiny edits “fixing” whatever they complain about.

Humans do not do this. Or more specifically, my experience with humans.

ajkjk 3 hours ago|||

'admit' isn't really the right word for that... the fact that it was placating you wasn't true until you prompted it to say so. Unlike a person who has an 'internal emotional state' independent of what they say that you can probe by asking questions.

awithrow 2 hours ago||

'admit' is anthropomorphizing the behavior, sure. The point is that sometimes the model's response will tighten, flag things that were overly supportive or what not. Sometimes it wont, it'll state that previous positions are still supported and continue to press it. Its not like either response is 'correct' but it can alter the rest of the responses in ways that are useful.

RugnirViking 3 hours ago|||

check out this article that was posted here a while back https://www.randalolson.com/2026/02/07/the-are-you-sure-prob...

The article's main idea is that for an AI, sycophancy or adversarial (contrarian) are the two available modes only. It's because they don't have enough context to make defensible decisions. You need to include a bunch of fuzzy stuff around the situation, far more than it strictly "needs" to help it stick to its guns and actually make decisions confidently

I think this is interesting as an idea. I do find that when I give really detailed context about my team, other teams, ours and their okrs, goals, things I know people like or are passionate about, it gives better answers and is more confident. but its also often wrong, or overindexes on these things I have written. In practise, its very difficult to get enough of this on paper without a: holding a frankly worrying level of sensitive information (is it a good idea to write down what I really think of various people's weaknesses and strengths?) and b: spending hours each day merely establishing ongoing context of what I heard at lunch or who's off sick today or whatever, plus I know that research shows longer context can degrade performance, so in theory you want to somehow cut it down to only that which truly matters for the task at hand and and and... goodness gracious its all very time consuming and im not sure its worth the squeeze

cruffle_duffle 3 hours ago|||

> goodness gracious its all very time consuming and im not sure its worth the squeeze

And when you step back you start to wonder if all you are doing is trying to get the model to echo what you already know in your gut back to you.

awithrow 3 hours ago||||

oh that's great. thanks for the link!

oldfrenchfries 3 hours ago|||

This is great, thanks for sharing!

secret_agent 4 hours ago|||

Use positive requests for behavior. For some reason, counter prompts "Don't do X" seems to put more attention on X than the "Don't do." It's something like target fixation, "Oh shit I don't want to hit that pothole..." bang

ambicapter 3 hours ago||

This is a well known problem in these kind of systems. I’m not 100% on what the issue is mechanically but it’s something like they can only represent the existence of things and not non-existence so you end up with a sort of “don’t think of the pink elephant” type of problem.

SpicyLemonZest 3 hours ago||

Isn't it just that, in the underlying text distribution, both "X" and "don't do X" are positively correlated with the subsequent presence of X? I've never seen that analysis run directly but it would surprise me if it weren't true.

raincole 3 hours ago|||

My rule of thumb:

1. Only one shot or two shot. Never try to have a prolonged conversation with an LLM.

2. Give specific numbers. Like "give me two alternative libraries" or "tell me three possible ways this might fail."

margalabargala 4 hours ago|||

Considering 4.6 came with a ton of changes around tooling and prompting this isn't terribly surprising.

dkersten 3 hours ago|||

I find Kimi white good if you ask it for critical feedback.

It’s BRUTAL but offers solutions.

awithrow 2 hours ago|||

what is Kimi white?

ohyoutravel 3 hours ago|||

Not soft, not mild, but BRUTAL! This broke my brain!

anandram27 3 hours ago|||

Could be an aspect of eval awareness mb

cyanydeez 4 hours ago|||

So, there's things you're fighting against when trying to constrain the behavior of the llm.

First, those beginning instructions are being quickly ignored as the longer context changes the probabilities. After every round, it get pushed into whatever context you drive towards. The fix is chopping out that context and providing it before each new round. something like `<rules><question><answer>` -> `<question><answer><rules><question>`.

This would always preface your question with your prefered rules and remove those rules from the end of the context.

The reason why this isn't done is because it poisons the KV cache, and doing that causes the cloud companies to spin up more inference.

Forgeties79 3 hours ago|||

I usually put “do not praise me, do not use emojis, I just want straight answers” something along those lines and it’s been surprisingly effective. Though it helps I can’t run particularly heavy duty models/don't carry on the “conversation” for super long durations.

colechristensen 3 hours ago|||

>"Help me refine my ideas, challenge, push back, and don't just be agreeable."

This is where you're doing it wrong.

If your LLM has a problem being more agreeable than you want, prompt it in a way that makes being agreeable contrary to your real intentions.

"there are bugs and logic problems in this code" "find the strongest refutation of this argument" "I don't like this plan and need to develop a solid argument against it"

Asking for top ten lists is a good method, it will rarely not come up with anything but you can go back and forth and refine until it's 10 ten reasons why your plan is bad are all insubstantial nonsense then you've made progress

dinkumthinkum 3 hours ago|||

You're not wrong and you're not crazy. In fact, you are absolutely right! It is not just These things are not just casual enablers. They are full-on palace sycophants following the naked emperor showering him with praise for his sartorial elegance. /s

righthand 4 hours ago||

That’s because the model isn’t actually thinking, pushing back, and challenging your ideas. It’s just statistically agreeing with you until it reaches too wide of a context. You’re living in the delusion that it’s “working” or having a “conversation” with you.

alehlopeh 3 hours ago||

How is conceptualizing what the model is doing as having a conversation any different from any other abstraction? “No, the browser isn’t downloading a file. The electrons in the silicon are actually…”

colechristensen 3 hours ago||

There are people with a philosophical objection to using everyday words to describe LLM interactions for various reasons, but commonly because they're worried stupid people will confuse the LLM for a person. Which, I suppose stupid people will do that, but I'm not inventing a parallel language or putting a * next to each thing which means "this, but with an LLM instead of a person"

152334H 4 hours ago||

Maybe it's not so sensible to offload the responsibility of clear thinking to AI companies?

How is a chatbot supposed to determine when a user fools even themselves about what they have experienced?

What 'tough love' can be given to one who, having been so unreasonable throughout their lives - as to always invite scorn and retort from all humans alike - is happy to interpret engagement at all as a sign of approval?

rsynnott 3 hours ago||

> How is a chatbot supposed to determine when a user fools even themselves about what they have experienced?

And even if it _could_, note, from the article:

> Overall, the participants deemed sycophantic responses more trustworthy and indicated they were more likely to return to the sycophant AI for similar questions, the researchers found.

The vendors have a perverse incentive here; even if they _could_ fix it, they'd lose money by doing so.

isodev 4 hours ago|||

> clear thinking

Most humans working in tech lack this particular attribute, let alone tools driven by token-similarity (and not actual 'thinking').

kibwen 4 hours ago|||

> Maybe it's not so sensible to offload the responsibility of clear thinking to AI companies?

Markets don't optimize for what is sensible, they optimize for what is profitable.

SlinkyOnStairs 4 hours ago||

It's not market driven. AI is ludicrously unprofitable for nearly all involved.

cyanydeez 4 hours ago||

The profit appears to be capturing the political class and it's associated lobbies and monied interests.

expedition32 3 hours ago||

It's almost as if being a therapist is an actual job that takes years of training and experience!

AI may one day rewrite Windows but it will never be counselor Troi.

fsmv 3 hours ago|||

Implying that programming is not an actual job that takes years of training and experience

To be clear I don't think the AI can do either job

duskdozer 3 hours ago||||

Well, unless insurance companies figure out they can make more money by pushing everyone onto AI [step-]therapy instead of actual therapy

yarn_ 3 hours ago|||

Come on, I'm sure Dario can find a nice tight bodysuit for claude

wisemanwillhear 3 hours ago||

With AI, I often like to act like a 3rd party who doesn't have skin in the game and ask the AI to give the strongest criticisms of both sides. Acting like I hold the opposite position as I truly hold can help sometimes as well. Pretending to change my mind is another trick. The idea is to keep the AI from guessing where I stand.

post-it 3 hours ago||

> Acting like I hold the opposite position as I truly hold can help sometimes as well.

I find this helps a lot. So does taking a step back from my actual question. Like if there's a mysterious sound coming from my car and I think it might be the coolant pump, I just describe the sound, I don't mention the pump. If the AI then independently mentions the pump, there's a good chance I'm on the right track.

Being familiar with the scientific method, and techniques for blinding studies, helps a lot, because this is a lot like trying to not influence study participants.

mynameisvlad 3 hours ago|||

I will generally ask for the "devil's advocate" view and then have it challenge my views and opinions and iterate through that.

It generally does a pretty good job as long as you understand the tooling and are making conscious efforts to go against the "yes man" default.

DrewADesign 1 hour ago||

Sounds like rubber-ducking with extra steps, tbh.

bfbsoundetch 29 minutes ago||

I am glad I found this article, as this is a serious issue with AI. Two years ago, I started using AI for studying and also for some personal matters - things you can't talk about with your friends. It turned out that AI always takes your side and makes you feel good. Sometimes, you know what you did was not the best thing, but AI takes your side and you feel good. With AI, people might feel less lonely, they think. But it is actually the start of not connecting with people. It should be a tool that we use for certain reasons, not a tool that drives us. Lets talk to real people and connect.

chasd00 6 minutes ago||

AI being the ultimate yes-man is probably why CEOs like it so much.

youknownothing 3 hours ago|

I think the problem stems from the fact that we have a number of implicit parameters in our heads that allow us to evaluate pros and cons but, unless we communicate those parameters explicitly, the AI cannot take them into account. We ask it to be "objective" but, more and more, I'm of the opinion that there isn't such a thing as objectivity, what we call objectivity is just shared subjectivity; since the AI doesn't know whose shared subjectivity we fall under, it cannot be really objetive.

I tend to use one of these tricks if not both:

- Formulate questions as open-ended as possible, without trying to hint at what your preference is. - Exploit the sycophantic behaviour in your favour. Use two sessions, in one of them you say that X is your idea and want arguments to defend it. In the other one you say that X is a colleague's idea (one you dislike) and that you need arguments to turn it down. Then it's up to you to evaluate and combine the responses.

rossdavidh 3 hours ago||

If the algorithm (whatever it is) evaluates its own output based on whether or not the user responds positively, then it will over time become better and better at telling people what they want to hear.

It is analogous to social media feeding people a constant stream of outrage because that's what caused them to click on the link. You could tell people "don't click on ragebait links", and if most people didn't then presumably social media would not have become doomscrolling nightmares, but at scale that's not what's likely to happen. Most people will click on ragebait, and most people will prefer sycophantic feedback. Therefore, since the algorithm is designed to get better and better at keeping users engaged, it will become worse and worse in the more fundamental sense. That's kind of baked into the architecture.

delusional 3 hours ago||

> I'm of the opinion that there isn't such a thing as objectivity

So you have rejected objective reality over accepting the evidence that "AI" contains no thinking or intelligence? That sounds unwise to me.

More comments...