Top
Best
New

Posted by Tomte 10 hours ago

The sigmoids won't save you(www.astralcodexten.com)
87 points | 122 comments
noosphr 14 minutes ago|
This article answers the question in the second paragraph then completely ignores the answer for the rest of it.

>My understanding is that this represents 3-4 “generations” of different technology (propellers, turbojets, etc). Each technology went through normal iterative improvement, then, when it reached its fundamental limits, got replaced by a better technology. The last technology, ramjets, reached its limit at about 3500 km/h, and there wasn’t the economic/regulatory will to develop anything better, so the record stands.

You don't have one sigmoid, you have multiple each stacked on top of each other. Airplanes aren't just one technology they are multiple technologies that happen to do the same thing.

Each one is following a sigmoid perfectly. It only looks exponential(ish) because of unpredictable discoveries that let you switch to another sigmoid that has a higher maximum potential.

The same is true in AI. If you used the same architecture as GPT2 today you're in for a bad time training a new frontier model. It's only because we have dozens of breakthroughs that the capabilities of models have improved as much as they have.

That said exponential and sigmoids are the wrong model to use for growth. Growth is a differential equation. It has independent inputs, it has outputs and some of those outputs are dependent inputs again. What happens depends entirely on what the specific DE that governs the given technology is. We can easily have a chaotic system with completely random booms and busts which have no deep fundamental rhyme or reason. We currently call that the economy.

Sniffnoy 11 minutes ago|
Yes, I was surprised he never discussed the idea that such exponentials are typically made of stacked sigmoids.

That said... if the exponential is made of stacked sigmoids, it's still an exponential on the whole! The fact that it's made of stacked sigmoids is relevant to the engineers making it, but not so relevant to the users or those otherwise affected by it.

noosphr 5 minutes ago||
Only so long as you can keep inventing the next sigmoid in the stack.
andy99 49 minutes ago||
AI has scaled well according to convenient measures. It (neural networks) have the property that whatever you define, they can rapidly be trained master it. We’re able to show that various tasks of increasing complication do not require intelligence and can be framed as autoregressive RL problems. I personally don’t think AI is any closer to sentient intelligence than LeNet; it’s almost trivially clear, we know how it works. So we’re measuring something orthogonal, basically how well a universal function approximator can fit to a function we define, given arbitrary computing power, and calling that progress. What will be really interesting is if we’re able to find a way to properly measure what they can’t do and what’s different about real intelligence.

Edit: in particular I don’t agree with

  But if someone claims that the trend toward increasing AI capabilities will never reach some particular scary level... 
One has to agree that the benchmark results are getting “scarier”, which is not automatically implied by finding more goals to optimize for
ordu 22 minutes ago|
> We’re able to show that various tasks of increasing complication do not require intelligence and can be framed as autoregressive RL problems.

The important thing we can show it in hindsight only. We don't know which other tasks we are currently mistaken about requiring intelligence. Maybe none of them are?

We don't know. We don't know what intelligence is. If we look at decades and even centuries of attempts to define intelligence, it is all looks like a goalposts moving. When a definition of intelligence starts to include people or things we don't like to think as of intelligent ones, we change the definition.

dreambuffer 1 hour ago||
FYI: The author has predicted that "AGI" will be here in 1-2 years and has staked his public reputation on it. He is personally invested in trendlines being lindy rather than sigmoid.

I don't think you can use lindy on trends as if trends are static objects, but that's another conversation.

throwawayk7h 58 minutes ago||
Mind you, he is only personally invested insofar as he's staked his reputation on it. Throughout his writing, he expresses the same point over and over again: desperately wants AI to slow down, advocates for politics that would slow it down, and most likely nothing would bring him greater peace than to see a sigmoid curve appear.
sigmoid10 12 minutes ago|||
AGI has become such a meaningless nondescript term, arguing when or how it is here has become pointless. Even OpenAI caved in and removed their AGI clause from their contract with Microsoft because they weren't fully sure that we are not there yet. The original ARC AGI was hailed as proof that AGI is not here yet, but now that ARC 1 and 2 got saturated, noone wanted to consider that perhaps we crossed the point where average humans are getting left behind. Frontier models are primarily limited by context and modality at this point, not by intelligence.
woeirua 44 minutes ago|||
Ok, but you can just look at the METR curve. Mythos saturated the 50% time horizon. The 80% is now at 3 hours. The rate of progress is accelerating not slowing down. There’s no indication yet that this is a sigmoid!
Sniffnoy 40 minutes ago|||
> FYI: The author has predicted that "AGI" will be here in 1-2 years and has staked his public reputation on it. He is personally invested in trendlines being lindy rather than sigmoid.

I mean, that's called "having an opinion".

paulpauper 36 minutes ago||
He wrote articles arguing that pro-AI people are dismissive of risks or even suggesting they are intellectually lazy. He's taken a side. if he's wrong I would hope he owns up to it
Sniffnoy 13 minutes ago||
> He's taken a side.

Yes, that's called "having an opinion". Typically people writing argumentative pieces are doing so because they have a belief about the matter. I'm not sure what exactly you expect here.

> if he's wrong I would hope he owns up to it

I think Scott Alexander is pretty good about that.

paulpauper 42 minutes ago||
He only has 1.5 more months. If he's wrong he needs to own it. Same for Eliezer Yudkowsky. But these people have too much riding on their brands. No one has the courage to fess up to being wrong. Given how many podcasts he and others have been on professing this belief, it will be hard to just pretend otherwise.
btilly 4 hours ago||
Lindy’s Law is an absolute gem, that I'm keeping.

If we don't understand the fundamental limits to any particular kind of trend, our default assumption should be that it will continue for about as long as it has gone on already.

We can, in fact, easily put a confidence interval on this. With 90% odds we're not in the first 5% of the trend, or the last 5% of the trend. Therefore it will probably go on between 1/19th longer, and 19 times longer. With a median of as long as it has gone on so far.

This is deeply counterintuitive. When we expect something to last a finite time, every year it goes on, brings us a year closer to when it stops. But every year that it goes on properly brings the expectation that it will go on for a year longer still.

We're looking at a trend. We believe that it will be finite. Our intuition for that is that every year spent, is a year closer to the end. But our expectation becomes that every year spent, means that it will last yet another year more!

How can we apply that? A simple way is stocks. How long should we expect a rapidly growing company, to continue growing rapidly?

cortesoft 1 hour ago||
I feel like Lindy's law doesn't work for things whose observation is partly controlled by the thing itself.

For example, take something like a fad or trend; they don't have a hard end date like human lifespan, so it should follow Lindy's law.

However, the likelihood, on average across the population, that you observe a trend is going to be higher at the end of a trend lifecycle than at the beginning. This is baked into the definition - more and more people hear about a trend over time, so the largest quantity of observers will be at the end of the lifecycle, when the popularity reaches its peak.

In other words, if you are a random person, finding out about a trend likely means it is near the end rather than the middle.

tsimionescu 59 minutes ago|||
While this is very fun as a mathematical exercise, it's completely irrelevant as a real tool for getting a better understanding of unknown processes in the real world.

The law only applies for certain types of processes, and is completely wrong for other types (e.g. a human who has lived 50 years may live 50 more, but one who has lived 100 years will certainly not live 100 more). So the question becomes: what type of process are you looking at? And that turns out to be exactly the question you started with: is there a fundamental limit to this growth curve, or not.

jfjfnfnttbtg 2 minutes ago||
> The law only applies for certain types of processes

Did you even read the post? It’s an estimate in the context where you have zero information on which to base an accurate estimate. The author’s point is that if you’re making a different estimate you need to actually say what information is informing that.

Human lifespan is obviously not a case where we have zero information, so what is your point in bringing that up?

jerf 4 hours ago|||
It's an interesting idea, and it may be something that could be mathematically justified, but I do think this is an abuse of Lindy's Law in the absence of such a justification. Per Wikipedia [1]:

"The Lindy effect applies to non-perishable items, like books, those that do not have an "unavoidable expiration date"."

And later in the article you can see the mathematical formulation which says the law holds for things with a Pareto distribution [2]. I'd want to see some sort of good analysis that "the life span of exponential growth curves" is drawn from some Pareto distribution. I don't think it's completely out of the question. But I'm also nowhere near confident enough that it is a true statement to casually apply Lindy's Law to it.

[1]: https://en.wikipedia.org/wiki/Lindy_effect

[2]: https://en.wikipedia.org/wiki/Pareto_distribution

btilly 3 hours ago||
The analysis in the article explains why it applies to any phenomena that we might be able to notice.

The argument given is the same as the one that I first ran across, not by that name, in https://www.nature.com/articles/363315a0. https://en.wikipedia.org/wiki/Doomsday_argument claims that it was a rediscovery of something that was hypothesized a decade article.

I hadn't tried to give it a name, or thought to apply it outside of that context.

As for the mathematical qualms, I'm a big believer in not letting formal mathematical technicalities get in the way of adopting an effective heuristic. And the heuristic reasoning here is compelling enough that I would like to adopt it.

tsimionescu 43 minutes ago||
The argument sounds nice, but it's just wrong. It only works if most processes you're going to encounter that you know nothing about happen to be Lindy processes. If most processes happening around you that you know nothing about are not of that type, then the argument fails.
skybrian 4 hours ago|||
You can do that but you're laundering ignorance into precise-seeming mathematics. Better to just say "we're probably somewhere in the middle, not at the beginning or end" and leave it at that. Calling a peak is hard.
btilly 3 hours ago||
You speak about laundering ignorance into precise-seeming mathematics as if it was a bad thing.

But that's the entire idea of Bayesian reasoning. Which has proven to be surprisingly effective in a wide range of domains.

I'm all for quantifying my ignorance, and using it as an outside view to help guide my expectations. Read the book Superforecasting to understand how effective forecasters use an outside view to adjust their inside view, to allow them to forecast things more precisely.

throwawayk7h 56 minutes ago|||
Closely related is Laplace's Rule of Succession[1], which basically says that (in lieu of other information), the odds of something happening next time go down the more times in a row that it doesn't happen (and vice versa).

So for example, the longer a time bomb ticks, the less likely it is to go off any time soon. (Assuming the timer isn't visible.) :)

[1] https://en.wikipedia.org/wiki/Rule_of_succession

LPisGood 4 hours ago||
This is the exact same heuristic used in CPU scheduling.

We expect fresh processes to terminate quickly and long running processes to last for a while longer.

LarsDu88 4 hours ago||
I think an interesting thing about recent AI developments is that its all happening right as we hit the diminishing returns side of another "exponential that's actually a sigmoid" which is Moore's law.

The naive expectation is that AI will slow down b/c Moore's law is coming to an end, but if you really think about the models and how they are currently implemented in silicon, they are still inefficient as hell.

At some point someone will build a tensor processing chip that replaces all the digital matmuls with analogue logamp matmuls, or some breakthrough in memristors will start breaking down the barrier between memory and compute.

With the right level of research funding in hardware, the ceiling for AI can be very high.

paulpauper 40 minutes ago||
Moore's law is bypassed with volume--more datacenters
throwaway27448 4 hours ago|||
Even at orders of magnitude greater speed, we've still hit diminishing returns for quality of output. We simply haven't found anything like superhuman reasoning ability, just superhuman (potentially) reasoning speed.
energy123 3 hours ago|||
It's not that easy to assess diminishing returns with saturated benchmarks where asymptoting to 100% is mathematically baked in. I could point to the number of Erdos proofs being solved by AI going from 0 to many very recently as evidence for acceleration.
throwaway27448 1 hour ago||
That is not evidence of acceleration, just of some measurable improvement compared to a previous model. After all, humans have made these breakthroughs since before recorded history—that never by itself implied accelerating intelligence.
LarsDu88 2 hours ago||||
I disagree with this. Reinforcement learning with verifiable rewards training is actually the secret sauce that is leading Claude and GPT to automating software engineering tasks.

All the easily verifiable domains such as mathematics, coding, and things that can be run inside a reasonable simulation are falling very very fast.

By next year if not sooner, mathematicians will be wildly outpaced by LLMs for reasoning.

Alex_L_Wood 54 minutes ago|||
Coding is anything but “easily” verifiable.
LarsDu88 8 minutes ago||
It's extremely verifiable. The reinforcement finetuning strategy I'm referring to involves LLM creating coding tasks with an expected output, implementing the code, and then having a compiler (or interpreter in the case of languages like python) succeed or fail to run the code. Then compare the output to expected output. The verification process (run interpreter + run test) can be done in seconds. One can generate millions of datasets like this for free and there is extensive research showing with the right policy, an agent will be able to learn to reason - first as good as human, and in many cases superior to a human.
horsawlarway 3 hours ago|||
Possibly - but we've also seen that spending more tokens on a task can improve the quality of the output (reasoning, CoT, etc).

So it's not impossible to have things that seem orthogonal, like generation speed or context length, have an impact on quality of result.

cyanydeez 4 hours ago||
they already did put a model into the silicon and it's crazy fast. https://chatjimmy.ai/

I'm pretty sure there's a 3 year design goal starting this year that'll do that to any of the qwen, deepseek, etc models. There's a lot you could do with sped up models of these quality.

It might even be bad enough that the real bubble is how much we don't need giant data centers when 80-90% of use cases could just be a silicon chip with a model rather than as you say, bloated SOTA

LarsDu88 2 hours ago|||
And this is an asic that is still operating digitally. Imagine a chip with baked it weights that does its math analogue with 20x reduction in number of circuit elements needed to do a multiplication op.

If there's a breakthrough in memristors, you could end up with another 20x reduction in circuit elements (get rid of memory bottlnecks, start doing multiplication ops as log transform voltage addition)

The ceiling is ultra high for how far AI can go.

clickety_clack 4 hours ago|||
It would be pretty cool to have interchangeable usb keys with models on them.
stymaar 2 hours ago||
I don't know when the sigmoid is going to kick in, but Nvidia's Quaterly datacenters revenues have been grown 15 folds over the past 3 years[1], and nobody including Scott believes this is sustainable for 3 more years otherwise Nvidia's market cap would conservatively be at least an order of magnitude higher than it is.

All exponential eventually becomes a sigmoid because exponential growth always expose limiting factors that weren't limiting at the beginning. Silicon manufacturing had lots of room for high-margin customers like Nvidia even a year ago (by the mere virtue of outbidding lower-margin customers), but now it is mostly gone, and no amount of money will make fabs build themselves overnight.

[1]: https://stockanalysis.com/stocks/nvda/metrics/revenue-by-seg...

gm678 5 hours ago||
I don't know what the Y-axis is supposed to be on that Wharton AI capabilities graph, but I am not really convinced that Opus 4.6 has more than double the intelligence/capability/whatever of GPT 5.1 Max.
NitpickLawyer 5 hours ago||
IIRC that graph tracks capabilities as time_to_solve a task for humans (i.e. the model can now handle tasks that usually take a human ~8h). Which, depending on what tasks you look at, could be a reasonable finding. I could see Opus 4.6 handling tasks that take ~8h for humans, and that 5.1 couldn't previously handle (with 5.1 being "limited" at 4h tasks let's say). It is a bit arbitrary, but I think this is what they're tracking.
lukan 4 hours ago|||
"It is a bit arbitrary, but I think this is what they're tracking."

I don't know if they can get their numbers right this way, but this seems a way more useful metric, than theoretic capabilities.

cyanydeez 4 hours ago||
ok, but arn't you just measuring efficiency and not the big I in AGI improvements.
Leynos 2 hours ago|||
It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.
jsnell 3 hours ago||||
No? I think you're misunderstanding what is being measured.

It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).

lukan 4 hours ago|||
Yes, but this study was not about that and "just efficiency" is actually what most people are after.

At least I want AI to solve my problems, not score high on a academic leaderboard.

jrumbut 4 hours ago||||
Without knowing more about their methodology, it seems like a lot of the recent improvements have involved the AI itself taking time to complete the task.

At first the models turned a 5 minute task into a 5 second task (by 5 seconds I mean a very short amount of time, not precisely 5 seconds). Then they turned a 15 minute task into a 5 second task.

Opus 4.6 completes 8 hour tasks all the time but (at least in my experience) it isn't spitting the answer out in 5 seconds anymore. It's using chain of thought and tools and the time to completion is measured in minutes or maybe hours.

In my experiments with local LLMs, a substantial part of the gap between frontier and local (for everyday use) is in tooling and infrastructure.

That is why I am sympathetic to the idea we are leveling off. But to bring in the air speed example from the article, I don't think we've reached the equivalent of the ramjet yet. I suspect in the coming years there will be new architectures, new hardware, and new ways to get even more capable models.

Leynos 2 hours ago||
It measures ability to complete (with a given success rate) a task with a known human benchmark time to complete. I.e., they set the task to human volunteers and timed how long they took the complete that task.
MadxX79 4 hours ago|||
I don't know why people are so impressed by 8h.

I trained an LLM to write the whole Harry Potter series, and that took JK Rowling like 17 years.

For my next point on the graph, I'll train the LLM to write the Bible, something that took humans >1500 years.

Leynos 2 hours ago||
Look at the tasks in the benchmark (see §2 https://arxiv.org/html/2503.14499v3)
strken 4 hours ago|||
Check out Re-Bench and HCAST.

The tasks are obviously all of the form "Go do this, and if you get the following output you passed". Setting up a web server apparently takes 15 minutes for a human, which is news to me since I'm able to search for https://gist.github.com/willurd/5720255, find the python one-liner, and copy it within about ten seconds.

Anyway, this is cool but it does not mean Claude can perform any human tasks that take less than 8 hours and are within its physical capabilities.

throwaway27448 4 hours ago|||
> more than double the intelligence/capability/whatever

I'm curious what people really mean when they say this. Intelligence is famously hard to define, let alone measure; it certainly doesn't scale linearly; it only loosely correlates to real-world qualities that are easy to measure; etc. Are you referring to coding ability or...?

adw 4 hours ago|||
https://podcasts.apple.com/us/podcast/machine-learning-stree... is a pretty good primer on METR, what it measures, and its limitations.
myhf 4 hours ago|||
According to this article: whenever someone games a benchmark to make an upward chart on some y-axis, it's YOUR responsibility to prove how and why that trend can't continue indefinitely.

emoji face with eyes rolling upward

skybrian 4 hours ago|||
Seems to me that the default is "I don't know what's going to happen" and if you're making a confident prediction, bring evidence.

Scott makes a Lindy effect argument which is plausible, but don't let that fool you, we still don't know what's going to happen.

AnimalMuppet 4 hours ago|||
I'm pretty sure that gaming benchmarks can continue indefinitely.
BoredPositron 5 hours ago||
https://metr.org/time-horizons/ on linear scale. Clickbait garbage article as most of his in the last year.
afthonos 5 hours ago||
…yeah, that’s where you see the exponential?
baxtr 42 minutes ago||
> The moral of the story is that, even though all exponentials eventually become sigmoids, this doesn’t necessarily happen at the exact moment you’re doing your analysis. Sometimes they stay exponential for much longer than that!

All exponentials eventually become sigmoids? Don’t think this can be true without qualifiers.

jvanderbot 39 minutes ago|
All models are wrong, of course, but this is kind of "common sense" so it's not hard to accept as true in a natural system. How can something continue on exponential growth forever without reaching a new blocker that causes slowdown or encountering pushback that makes it an oscillator. A pendulum looks exponential when it is at its peak and accelerating down.

The issue is that the exponential-looking part of the sigmoid might contain all of human history, sure, but most folks who espouse this theory probably agree that over time everything reaches a steady-enough state to be considered non-exponential, or become oscillatory.

philipallstar 5 hours ago||
But they do explain the improvement of AI driving 2017-2021 vs 2022-2026.
OscarCunningham 4 hours ago|
John D Cook gives more technical details here: "Trying to fit a logistic curve" https://www.johndcook.com/blog/2025/12/20/fit-logistic-curve...
More comments...