Top
Best
New

Posted by salkahfi 4 days ago

How does misalignment scale with model intelligence and task complexity?(alignment.anthropic.com)
241 points | 79 commentspage 3
lewdwig 4 days ago|
I guess it’s reassuring to know Hanlon’s Razor holds for AGI too.
kalap_ur 3 days ago||
Well, this sounds like a "no shit Sherlock" statement: >>Finding 3: Natural "overthinking" increases incoherence more than reasoning budgets reduce it We find that when models spontaneously reason longer on a problem (compared to their median), incoherence spikes dramatically. Meanwhile, deliberately increasing reasoning budgets through API settings provides only modest coherence improvements. The natural variation dominates.<<

Language models are probabilistic and not deterministic. Therefore incoherence _by definition_ increases as a response becomes lengthier. This is not true for humans, who tend to act/communicate deterministically. If I ask the human, to read a pdf and ask, is there a word of "paperclip" in the pdf? The human deterministically will provide a yes/no answer and no matter how many times we repeat the process, they will provide the same answer consistently (not due to autocorrelation, because this can be done across different humans). LMs will have a probabilistic response - dependent on the training itself: with a very well trained model we can get a 99% probabilistic outcome, which means out of 100 simulations, it will give you 1 time the wrong answer. We have no clue about the "probablistic" component for LMs, however, simulations could be done to research this. Also, I would be very curious about autocorrelation in models. If a human did a task and came to a conclusion "yes", then he will always respond with increasing amount of eyerolling to the same task: "yes".

Also, imagine the question: "is the sky blue?" answer1: "Yes." This has 0 incoherence. answer2: "Yes, but sometimes it looks like black, sometimes blue." While this answer seemingly has 0 incoherence, the probability of increased incoherence is larger than 0 given that answer generation itself is probabilistic. Answer generation by humans is not probabilistic.

Therefore, probability driven LMs (all LMs today are probability driven) will always exhibit higher incoherence than humans.

I wonder if anybody would disagree with the above.

hogehoge51 4 days ago||
My ignorant question: They did bias and variance noise, how about quantisation noise? I feel like sometimes agents are "flipfloping" between metastable divergent interpretations of the problem or solution.
cyanydeez 4 days ago||
[flagged]
root_axis 4 days ago||
This is very interesting research and a great write up.

I just want to nitpick something that really annoys me that has become extremely common: the tendency to take every opportunity to liken all qualities of LLMs to humans. Every quirk, failure, oddity, limitation, or implementation detail is relentlessly anthropomorphized. It's to the point where many enthusiasts have convinced themselves that humans think by predicting the next token.

It feels a bit like a cult.

Personally, I appreciate more sobriety in tech, but I can accept that I'm in the minority in that regard.

IgorPartola 4 days ago||
For some reason the article reads to me like “AI is not evil, it just has accidents when it loses coherence.” Sounds a lot like liability shifting.
dmix 4 days ago|
They compared it to industrial accidents. I don't think a software company would try to shift liability by comparing themselves to factories explosions and chemical spills.
gnarlouse 4 days ago||
I feel vindicated when I say that the superintelligence control problem is a total farce, we won't get to superintelligence, it's tantamount to a religious belief. The real problem is the billionaire control problem. The human-race-on-earth control problem.
MrOrelliOReilly 4 days ago||
I don’t believe the article makes any claims on the infeasibility of a future ASI. It just explores likely failure modes.

It is fine to be worried about both alignment risks and economic inequality. The world is complex, there are many problems all at once, we don’t have to promote one at the cost of the other.

HNisCIS 4 days ago||
Yeah article aside, looking back on all the AGI stuff from the last year or so really puts our current moment in protective.

This whole paradigm of AI research is cool and all but it's ultimately a simple machine that probabilistically forms text. It's really good at making stuff that sounds smart but like looking at an AI picture, it falls apart the harder you look at it. It's good at producing stuff that looks like code and often kinda works but based on the other comments in this thread I don't think people really grasp how these models work.

throwpoaster 4 days ago||
[flagged]
dang 4 days ago|
Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

throwpoaster 3 days ago||
I will, I apologize, and I love the site.

I do try to contribute constructively but am annoyed at getting downvote-hammered by what I perceive as an echo chamber.

It is very possible that I lack the social skills to understand how what I am doing is inappropriate. I will read the guidelines.

Sorry, and thanks for your efforts.

tsunamifury 4 days ago|
I don’t know why it seems so hard for these guys to understand you scorecard every step for new strategy to Close distance at goal and if you have multiple generated forward options with no good weight you spawn a new agent and multiple paths. Then you score all the terminal branches and prune.

LLMs aren’t constrained to linear logic like your average human.