AI overly affirms users asking for personal advice

Posted by oldfrenchfries 8 hours ago

AI overly affirms users asking for personal advice(news.stanford.edu)

431 points | 343 commentspage 3

jwilliams 5 hours ago|

For me the framing is critical - what is the model saying yes to? You can present the same prompt with very different interpretations (talk me into this versus talk me out of it). The problem is people enter with a single bias and the AI can only amplify that.

In coding I’ll do what I call a Battleship Prompt - simply just prompt 3 or more time with the same core prompt but strong framing (eg I need this done quickly versus come up with the most comprehensive solution). That’s really helped me learn and dial in how to get the right output.

jstummbillig 4 hours ago||

Overly, compared to what? Most people I know would be hard pressed to give either accurate information or even honest opinions when specifically asked. People want to be liked and people want to like people for reasons that have little to do with accuracy or honesty.

jl6 4 hours ago||

I believe this is what they call yasslighting: the affirmation of questionable behavior/ideas out of a desire to be supportive. The opposite of tough love, perhaps. Sometimes the very best thing is to be told no.

justin_dash 7 hours ago||

So at this point I think it's pretty obvious that RLHFing LLMs to follow instructions causes this.

I'm interested in a loop of ["criticize this code harshly" -> "now implement those changes" -> open new chat, repeat]: If we could graph objective code quality versus iterations, what would that graph look like? I tried it out a couple of times but ran out of Claude usage.

Also, how those results would look like depending on how complete of a set of specs you give it.

IncreasePosts 4 hours ago|

In my experience prompting llms to be critical leads then to imagine issues, or to bike shed

ohsecurity 4 hours ago||

Not that surprising. If you optimize for a pleasant interaction, you often get agreement instead of correction. The question is whether we actually want advice systems to feel good, or to be right.

chasd00 2 hours ago||

AI being the ultimate yes-man is probably why CEOs like it so much.

tlogan 4 hours ago||

This needs to be taken in context. In my view, AI definitely gives better advice than friends, acquaintances, or colleagues (at least in the US culture). But the advice from parents is still the most valuable.

Here is how I would rank it:

1. Parents

2. AI

3. Friends and family

4. Internet search

5. Reddit

rimbo789 4 hours ago||

Why do you trust ai so much? I don’t trust it to tell me the sky is blue.

verdverm 4 hours ago||

ime, my parents gave some of the worst advice in addition to being bigots

My closest friends are #1 because they know me, my history, and my vices

rsynnott 6 hours ago||

> They also included 2,000 prompts based on posts from the Reddit community r/AmITheAsshole, where the consensus of Redditors was that the poster was indeed in the wrong

Holy shit, then it's _very_ bad, because AmITheAsshole is _itself_ overly-agreeable, and very prone to telling assholes that they are not assholes (their 'NAH' verdict tends to be this).

More seriously, why the hell are people asking the magic robot for relationship advice? This seems even more unwise than asking Reddit for relationship advice.

> Overall, the participants deemed sycophantic responses more trustworthy and indicated they were more likely to return to the sycophant AI for similar questions, the researchers found.

Which is... a worry, as it incentivises the vendors to make these things _more_ dangerous.

bilsbie 4 hours ago||

Has anyone found a good prompt to fix this? It seems like a subtle problem because it’s 90% too agreeable but will sometimes get really stubborn.

verdverm 4 hours ago|

There is no sufficient prompt because this is trained into them during mid-late phases. It's ingrained into the weights

maddmann 6 hours ago|

This paper feels a bit biased in that it is trying to prove a point versus report on results objectively. But if you look at the results of study 3, doesn’t it suggest that there are ai models that can improve how people handle interpersonal conflict?! Why isn’t that discussed more?

More comments...