Amateur armed with ChatGPT solves an Erdős problem

Posted by pr337h4m 23 hours ago

Amateur armed with ChatGPT solves an Erdős problem(www.scientificamerican.com)

https://www.erdosproblems.com/1196

552 points | 380 commentspage 4

nomilk 8 hours ago|

A similar announcement was made a few months ago, and Terence Tao came out a few days later and said it wasn't what it seemed at first, in that it was a rediscovery of an already known (albeit esoteric) result...

logicprog 5 hours ago|

They literally have a quote from Tao in the article saying it was a novel approach humans hadn't tried, and that the problem hadn't been solved even after a lot of professional attention.

dnnddidiej 5 hours ago||

How do you get real mathematicians to check the potential slop. At some point there will be spam to Tao from claws finding problens to solve and submitting maybe proofs/answers.

brohee 2 hours ago|

In the end "proofs" that are not machine checked will be left unread unless submitted by someone very respected in the field...

resident423 14 hours ago||

I wonder if the rationalizations people come up with for why this isn't real intelligence will be as creative as ChatGPTs solution.

thesmtsolver2 13 hours ago||

Remember when people thought multiplying numbers, remembering a large number of facts, and being good at rote calculations was intelligence?

Some people think that multiplying numbers, remembering a large number of facts, and being good at calculations is intelligence.

Most intelligent people do not think that.

Eventually, we will arrive at the same conclusion for what LLMs are doing now.

resident423 13 hours ago|||

Remember when people thought solving Erdos problems required intelligence? Is there anything an LLM could ever do that would cound as intelligence? Surely the trend has to break at some point, if so what would be the thing that crosses the line to into real intelligence?

NitpickLawyer 10 hours ago|||

> Remember when people thought solving Erdos problems required intelligence? Is there anything an LLM could ever do that would cound as intelligence?

Hah. It reminds me of this great quote, from the '80s:

> There is a related “Theorem” about progress in AI: once some mental function is programmed, people soon cease to consider it as an essential ingredient of “real thinking”. The ineluctable core of intelligence is always in that next thing which hasn’t yet been programmed. This “Theorem” was first proposed to me by Larry Tesler, so I call it Tesler’s Theorem: “AI is whatever hasn’t been done yet.”

We are seeing this right now in the comments. 50 years later, people are still doing this! Oh, this was solved, but it was trivial, of course this isn't real intelligence.

latexr 8 hours ago||

That is a “gotcha” born of either ignorance (nothing wrong with that, we’re all ignorant of something) or bad faith. Definitions shift as we learn more. Darwin’s definition of life is not the same as Descartes’ or Plato’s or anyone in between or since because we learn and evolve our thinking.

Are you also going to argue definitions of life before we even learned of microscopic or single cell organisms are correct and that the definitions we use today are wrong? That they are shifting goal posts? That “centuries later, people are still doing this”? No, that would be absurd.

NitpickLawyer 8 hours ago||

I don't see it as a gotcha. Just an (evergreen, it seems) observation that people will absolutely move the goalposts every time there's something new. And people can be ignorant outsiders or experts in that field as well.

For example, ~2 years ago, an expert in ML publicly made this remark on stage: LLMs can't do math. Today they absolutely and obviously, can. Yet somehow it's not impressive anymore. Or, and this is the key part of the quote, this is somehow not related to "intelligence". Something that 2 years ago was not possible (again, according to a leading expert in this field), is possible today. And yet this is somehow something that they always could do, and since they're doing it today, is suddenly no longer important. On to the next one!

No idea why this is related to darwin or definitions of life. The definitions don't change. What people considered important 2 years ago, is suddenly not important anymore. The only thing that changed is that today we can see that capability. Ergo, the quote holds.

latexr 7 hours ago||

> For example, ~2 years ago, an expert in ML

See, that’s a poor argument already. Anyone could counter that with other experts in ML publicly making remarks that AI would have replaced 80% of the work force or cured multiple diseases by now, which obviously hasn’t happened. That’s about as good an argument as when people countered NFT critics by citing how Clifford Stoll said the internet was a fad.

> made this remark on stage: LLMs can't do math. Today they absolutely and obviously, can.

How exactly are “LLMs can’t” and “do math” defined? As you described it, that sentence does not mean “will never be able to”, so there’s no contradiction. Furthermore, it continues to be true that you cannot trust LLMs on their own for basic arithmetic. They may e.g. call an external tool to do it, but pattern matching on text isn’t sufficient.

> The definitions don't change.

Of course they do, what are you talking about? Definitions change all the time with new information. That’s called science.

NitpickLawyer 7 hours ago||

The definition of "can/cannot do math" didn't change. That's not up for debate. 2 years ago they couldn't solve an erdos problem (people have tried, Tao has tried ~1 year ago). Today they can.

Definitions don't change. The idea that now that they can it's no longer intelligence is changing. And that's literally moving the goalposts. Read the thread here, go to the bottom part. There are zillions of comments saying this.

You are keen to not trying to understand what the quote is saying. This is not good faith discussion, and it's not going anywhere. We're already miles from where we started. The quote is an observation (and an old one at that) about goalposts moving. If you can't or won't see that, there's no reason to continue this thread.

latexr 5 hours ago||

> The definition of "can/cannot do math" didn't change. That's not up for debate.

That is not the argument. The point is that the way you phrased it is ambiguous. “Math” isn’t a single thing, and “cannot” can either mean “cannot yet” or “cannot ever”. I don’t know what the “expert” said since you haven’t provided that information, I’m directly asking you to clarify the meaning of their words (better yet, link to them so we can properly arrive at a consensus).

> Definitions don't change.

Yes they do! All the time!

https://www.merriam-webster.com/wordplay/words-that-used-to-...

> And that's literally moving the goalposts.

Good example. There are no literal goal posts here to be moved. But with the new accepted definition of the words, that’s OK.

> There are zillions of comments saying this.

Saying what, exactly? Please be clear, you keep being ambiguous. The thread barely crossed a couple of hundred comments as of now, there are not “zillions” of comments in agreement of anything.

> You are keen to not trying to understand what the quote is saying. (…) If you can't or won't see that, there's no reason to continue this thread.

Indeed, if you ascribe wrong motivations and put a wall before understanding what someone is arguing, there is indeed no reason to continue the thread. The only wrong part of your assessment is who is doing the thing you’re complaining about.

yfee 5 hours ago||

He’s a booster and I don’t think he argues in good faith.

He seems to be fixated on this notion that humans are static and do not evolve - clearly this is false. What people thought as being a determinant for intelligence also changes as things evolve.

noosphr 12 hours ago||||

I've spend a good chunk of time formalising mathematics.

Doing formalized mathematics is as intelligent as multiplying numbers together.

The only reason why it's so hard now is that the standard notation is the equivalent of Roman numerals.

When you start using a sane metalanguage, and not just augmrnted English, to do proofs you gain the same increase in capabilities as going from word equations to algebra.

xxs 7 hours ago||

>the standard notation is the equivalent of Roman numerals.

But the Roman numerals are easy. I was able to use them before 1st grade and I can't touch any "standard notation" to this day.

thesmtsolver2 12 hours ago||||

When will LLM folks realize that automated theorem provers have existed for decades and non-ML theorem provers have solved non-trivial Math problems tougher than this Erdos problem.

Proposing and proving something like Gödel's theorem's definitely requires intelligence.

Solving an already proposed problem is just crunching through a large search space.

throwaway198846 7 hours ago|||

Automated theorem provers can't prove this problem. Which non-trivial Math problem you think are thougher than this Erdos problem?

virgildotcodes 7 hours ago||||

So the only intelligent people in history are those who invent new fields of mathematics, got it.

You can just about make out those goalposts on the surface of the moon with a good telescope at this point.

crazylogger 10 hours ago|||

"Hi ChatGPT, propose and prove something radically new in the genre of Gödel's theorem."

How is this not just another proposed problem (albeit with a search space much larger than an Erdos problem's)?

dmurray 8 hours ago||

I think the point the GP is making is that Gödel's theorem wasn't part of any "genre". Gödel, or somebody, had to invent the whole field, and we haven't seen LLMs invent new fields of mathematics yet.

But this isn't a fair bar to hold it to. There are plenty of intelligent people out there, including 99% of professional mathematicians, who never invent new fields of mathematics.

_0ffh 7 hours ago|||

Well, the famous Turing test was evidently insufficient. All that happened is that the test is dead and nobody ever mentions it anymore. I'm not sure that any other test would fare any better once solved.

heresie-dabord 4 hours ago|||

I've had a similar notion that Time() is a necessary test function. Maybe it's because of the limitations of human cognition. (We have biases and blind-spots and human intelligence itself is erratic.)

I find it's helpful to avoid conflating the following three topics:

/1/ Is the tool useful?

/2/ At scale, what is the economic opportunity and social/environmental impact?

/3/ Is the tool intelligent?

Casual observation suggests that most people agree on /1/. An LLM can be a useful tool. (Present case: someone found a novel approach to a proof.) So are pocket calculators, personal computers, and portable telephones. None of these tools confers intelligence, although these tools may be used adeptly and intelligently.

For /2/, any level of observation suggests that LLMs offer a notable opportunity and have a social/environmental impact. (Present case: students benefitted in their studies.) A better understanding comes with Time() ... our species is just not good at preparing for risks at scale. The other challenge is that competing interests may see economic opportunities that don't align for social/environmental Good.

Topic /3/ is of course the source of energetic, contentious debate. Any claim of intelligence for a tool has always had a limited application. Even a complex tool like a computer, a modern aircraft, or a guided missile is not "intelligent". These tools are meant to be operated by educated/trained personnel. IBM's Deep Blue and Watson made headlines -- but was defeating humans at games proof of Intelligence?

On this particular point, we should worry seriously about conferring trust and confidence on stochastic software in any context where we expect humans to act responsibly and be fully accountable. No tool, no software system, no corporation has ever provided a guarantee that harm won't ensue. Instead, they hire very smart lawyers.

famouswaffles 12 hours ago|||

None of it is really from logical thought. The rationalizations don't make any sense, but they haven't for a while. It's an emotional response. Honestly, It's to be expected.

threethirtytwo 12 hours ago||

It's because HN is not really full of smart people. It's full of people who think they're smart and take pride in that idea that they're pretty intelligent.

ChatGPT equalizes intelligence. And that is an attack on their identity. It also exposes their ACTUAL intelligence which is to say most of HN is not too smart.

missingdays 8 hours ago|||

> ChatGPT equalizes intelligence

Citation needed

simianwords 8 hours ago||

how can you ask this question with on a post titled "Amateur armed with ChatGPT solves an Erdős problem"???? are you looking for some randomised control trial? omg

adQ28 4 hours ago|||

We just look at comments from AI boosters and it is self-evident that no intelligence is being equalized.

JumpCrisscross 5 hours ago|||

Idk, going out on a limb and guessing the folks who hang out on erdosproblems.com aren’t run-of-the-mill dumbasses. The prompt, if you look at it, is actually quite clever. Not as clever as the proof. But far from the equalization OP posits.

simianwords 5 hours ago||

Directionally it is correct - an amateur wouldn’t be able to do this without ChatGPT. You can’t expect maximal democratisation

bsza 8 hours ago|||

> ChatGPT equalizes intelligence

Yes, I love living in communism too. Imagine if you had to pay money for it or something. The wealthiest people would get unrestricted access to intelligence while the poor none. And the people in the middle would eventually find themselves unable to function without a product they can no longer afford. Chilling, huh? Good thing humans are known for sharing in the benefits of technological progress equally. /s

Jtarii 5 hours ago|||

Huh?

Before ChatGPT it costs ~$100,000 to aquire intelligence good enough to solve this Erdos problem, now it costs ~$200.

I'm really confused at what you are even taking an issue with.

simianwords 8 hours ago|||

what? the post is literally titled "Amateur armed with ChatGPT solves an Erdős problem". stop spreading FUD about unaffordability

bsza 7 hours ago||

They used ChatGPT Pro to solve it. Over 50% of people in the world couldn't afford ChatGPT Pro ($200/mo) even if they spent more than half of their income on it. [1]

What was that about "spreading FUD about unaffordability"?

[1] https://ourworldindata.org/grapher/share-living-with-less-th...

sunaookami 7 hours ago||

They didn't buy ChatGPT Pro themselves. You could've done the same as the students in the article and get a free subscription if you were interested in this instead of trolling.

bsza 6 hours ago||

> You could've done the same

Please show me the steps to get a $200 subscription for free that works 100% of the time regardless of who you are. I'm listening.

simianwords 5 hours ago||

ChatGPT flattened the difference between top .0001 percentile mathematician and an amateur. This is the definition of making intelligence more available.

You are exaggerating the situation by essentially claiming since some people can’t afford 200 dollars this means ChatGPT is not democratising intelligence. It’s a bit strange to claim this because according to you it only becomes affordable when maximal number of people can afford it. It’s a bit childish.

Directionally it is democratising. Are more people able to afford higher level intelligence? Yes.

bsza 5 hours ago||

> ChatGPT flattened the difference between top .0001 percentile mathematician and an amateur

It flattened the difference between a top epsilon percentile mathematician and an amateur with money. It didn't flatten the difference between an amateur with a little money and an amateur with a lot of money. It widened it. That's the part I'm scared about.

You are shrugging this off because it currently isn't that expensive. But we're talking about the massively subsidized price here, which is bound to get orders of magnitude higher when the bubble pops. Models are also likely to get much better. If it gets to a point where the only way to obtain exceptionally high intelligence is with an exceptionally high net worth and vice versa, how is that going to democratize anything?

slashdave 11 hours ago|||

Proving a negative is a pretty high bar. You also have the problem of defining "real intelligence", which I suspect you can't.

famouswaffles 11 hours ago||

Intelligence is Intelligence. It's intelligent because it does intelligent things. If someone feels the need to add a 'real' and 'fake' moniker to it so they can exclude the machine and make themselves feel better (or for whatever reason) then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it. It's the equivalent of nonsensical rambling. At the end of the day, the semantic quibbling won't change anything.

latexr 8 hours ago||

> It's intelligent because it does intelligent things.

Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.

> then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it.

That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul. Similarly, one does not need to have a tight definition of “life” to say a dog is alive but a rock isn’t. Definitions evolve all the time when new information arises, and some (like “art”) we haven’t been able to pin down despite centuries of thinking about it.

famouswaffles 3 hours ago||

>Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.

If you wanted to insist a calculator wasn't intelligent and satisfy my conditions then you can. At the very least you can test for the sort of intelligence that is present in humans but absent from calculators and cleanly separate the two. These are very easy conditions if there is some actual real difference.

>That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul.

No it's not, and this is a silly argument. Foul food tastes different. Sometimes it even looks different. You can test for it and satisfy my conditions.

You come across a shiny piece of yellow metal that you think is gold. It looks like gold, feels like gold and tests like gold. Suddenly a strange fellow comes about insisting that it's not actually gold. No, apparently there is a 'fake' gold. You are intrigued so you ask him, "Alright, what exactly is fake gold, and how can I test or tell them apart ?". But this fellow is completely unable to answer either question. What would you say about him ? He's nothing more than a mad man rambling about a distinction he made up in his head.

What I'm asking you to do is incredibly easy and basic with a real distinction. I'm not going to tell you to stop believing in your fake gold, but I am going to tell you I and no one else can be expected to take you seriously.

latexr 2 hours ago||

> At the very least you can test for the sort of intelligence that is present in humans but absent from calculators and cleanly separate the two.

But you can only do that now, in hindsight. Before calculators, one could argue being able to do math was a sign of intelligence, but once something new comes along which can do math in a non-intelligent way, you can realise “ah, right, my definition was incomplete/incorrect, I need something better”.

> Foul food tastes different.

You’re right, that was a bad example.

> You come across a shiny piece of yellow metal that you think is gold. (…) He's nothing more than a mad man rambling about a distinction he made up in his head.

No, that is not right. Fool’s gold is a thing.

https://en.wikipedia.org/wiki/Pyrite

It’s not the same as gold and you can test for it, but that doesn’t mean you know how to do it. Yet it’s perfectly possible that by being exposed to the real and fake thing you’ll get a feel for each one as there are subtle visual clues. It doesn’t mean you can articulate exactly what those are, yet you’re able to do it.

It’s like tasting two similar beers or sodas. You may be able to identify them by taste and understand they’re difference but be unable to articulate exactly how you know which is which to the point someone else can use your verbal instructions to know the difference. That doesn’t mean the difference isn’t there or that you can’t tell, it just means you haven’t yet found yourself the proper way to extract and impart what you instinctively understood.

famouswaffles 2 hours ago||

>But you can only do that now, in hindsight.

No you could always do that. The meaning you take from it is up to you but you could always separate humans and calculators.

>No, that is not right. Fool’s gold is a thing.

I know what fools gold is. I used it for contrast. Fools gold can be tested for.

>but that doesn’t mean you know how to do it.

It doesn't matter. If you claim it exists but you don't know how to do it and you can't point to anyone who can, it's the same as something you made up.

>It’s like tasting two similar beers or sodas. You may be able to identify them by taste and understand they’re difference but be unable to articulate exactly how you know which is which to the point someone else can use your verbal instructions to know the difference.

You are still making the same mistake. Two similar beers or sodas taste different. No one is asking you to come up with a theory for intelligence. All you have to say here is the equivalent of "It tastes different" and let me taste it for myself. But even that much, you can not do. So why on earth should I treat what you say as worth anything ?

chrishare 12 hours ago|||

LLMs are definitely intelligent - just not general like humans, and very very jagged (succeedingand failing in head-scratching ways).

vatsachak 12 hours ago|||

Well it still gets easy problems wrong

With real general intelligence you'd expect it to solve problems above a certain difficulty with a good clip

pepa65 11 hours ago||

That "it" is a huge variety and range of things...

walrus01 14 hours ago|||

For one, everything its 'intelligence' knows about solving the problem is contained within the finite context window memory buffer size for the particular model and session. Unless the memory contents of the context window are being saved to storage and reloaded later, unlike a human, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced later.

in-silico 12 hours ago|||

For one, everything humans' "intelligence" knows about solving the problem is contained within the finite brain size for the particular person and life. Unless the memory contents of the brain are being saved to storage and reloaded later, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced in a later life.

jychang 14 hours ago||||

There's humans that have memory issues, or full blown Anterograde amnesia.

emp17344 13 hours ago||

There are humans who can’t read. That doesn’t mean Grammarly is “intelligent”. These things are tools - nothing more, nothing less.

resident423 14 hours ago||||

What your describing sounds more like the model is lacking awareness than lacking intelligence? Why does it need to know it solved the problem to be intelligent?

walrus01 13 hours ago||

We say African Elephants are intelligent for a number of reasons, one of which is because they remember where sources of water are in very dry conditions, and can successfully navigate back to them across relatively large distances. An intelligent being that can't remember its own past is at a significant disadvantage compared to others that can, which is exactly one of the reasons why alzheimers patients often require full time caregivers.

resident423 13 hours ago|||

There's probably a limit to how intelligent something can be with no long term memory, but solving Erdos problems in 80 minutes is clearly not above it, and I think the true limit is probably much higher than that.

peteforde 13 hours ago|||

You are confusing lack of intelligence with the presence of impairment.

charcircuit 12 hours ago||||

As another commenter pointed out these models are being trained how to save and read context into files so denying them to use such an ability that they have just makes your claim tautological.

bpodgursky 13 hours ago|||

All modern harnesses write memory files for context later.

bsder 12 hours ago|||

<edit> My mistake. Responded to a bot but can't delete now. Sorry. <edit>

resident423 12 hours ago||

No, but I'm interested to know what it is?

tomlockwood 13 hours ago|||

I think one day the VCs will have given the monkeys on typewriters enough money that these kinds of comments can be generated without human intervention.

otabdeveloper3 12 hours ago|||

[dead]

catcowcostume 13 hours ago|||

You're really telling on yourself if you think LLM is intelligence

techblueberry 13 hours ago|||

This is real intelligence is the bear position, so I think it’s real intelligence.

0xBA5ED 13 hours ago||

And how about the creative rationalizations about how statistical text generation is actual intelligence? As if there is any intent or motive behind the words that are generated or the ability to learn literally any new thing after it has been trained on human output?

tptacek 12 hours ago|||

2022 called, wants this argument back. When you're "statistically generating text" to find zero-day vulnerabilities in hard targets, building Linux kernel modules, assembly-optimizing elliptic curve signature algorithms, and solving arbitrary undergraduate math problems instantaneously --- not to mention apparently solving Erdos problems --- the "statistical text" stuff has stopped being a useful description of what's happening, something closer to "it's made of atoms and obeys the laws of thermodynamics" than it is to "a real boundary condition of what it can accomplish".

I don't doubt that there are many very real and meaningful limitations of these systems that deserve to be called out. But "text generation" isn't doing that work.

emp17344 12 hours ago||

But the systems that do that impressive work are no longer just LLMs. Look at the Claude Code leak - it’s a sprawling, redundant maze relying on tools and tests to approximate useful output. The actual LLM is a small portion of the total system. It’s a useful tool, but it’s obviously not truly intelligent - it was hacked together using the near-trillions of dollars AI labs have received for this explicit purpose.

tptacek 12 hours ago||

What does this matter? You can build a working coding agent for yourself extremely quickly; it's remarkably straightforward to do (more people should). But look underneath all the "sprawling tools": the LLM itself is a sprawling maze of matrices. It's all sprawling, it's all crazy, and it's insane what they're capable of doing.

Again if you want to say they're limited in some way, I'm all ears, I'm sure they are. But none of that has anything to do with "statistical text generation". Apparently, a huge chunk of all knowledge work is "statistical text generation". I choose to draw from that the conclusion that the "text generation" part of this is not interesting.

emp17344 12 hours ago||

Well, hang on a second - it sounds like you may actually disagree with the user who created this thread. That user claims that these systems exhibit “real intelligence”, and success on this Erdos problem is proof.

You seem to be making the claim that LLMs are statistical text generators, but statistical text generation is good enough to succeed in certain cases. Those are different arguments. What do you actually believe? Are we even in disagreement?

tptacek 12 hours ago|||

I don't have any opinion about "real intelligence" or not. I'm not a P(doom)er, I don't think we're on the bring of ascending as a species. But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are.

baxtr 11 hours ago||

Just to clarify because I’m not sure I understand:

So you agree that LLMs are in fact statistical text generators but you don’t like people use that fact in arguments about the capabilities of the things?

Jtarii 8 hours ago|||

It's like a genotype/phenotype distinction, the genotype may be statistical text generator but the phenotype is something much more.

fc417fc802 10 hours ago|||

Not parent but I think you're being rather dense. They are _obviously_ statistical text generators. There's plenty of source code out there, anyone can go and inspect it and see for themselves so disputing that is akin to disputing the details of basic arithmetic.

But it is no longer useful to bring that fact up when conversing about their capabilities. Saying "well it's a statistical text generator so ..." is approximately as useful as saying "well it's made of atoms so ...". There are probably some very niche circumstances under which statements of each of those forms is useful but by and large they are not and you can safely ignore anyone who utters them.

pepa65 11 hours ago|||

He does say that LLMs are just a part of the models used these days.

resident423 13 hours ago|||

Solving open math problems is strong evidence of intelligence so there's not really any need for rationalization? I don't understand why intelligence would require intent or motive? Isn't intent just the behaviour of making a specific thing happen rather than other things?

x3ro 13 hours ago|||

I'm curious, do you think that this also applies to stable diffusion? Are these models "creative" too?

resident423 12 hours ago|||

I haven't used stable diffusion enough to have a strong opinion on it. But my thinking is LLMs have only recently started contributing novel solutions to problems, so maybe there is some threshold above which there's less sloppy remixing of training data and more ability to form novel insights, and image generators haven't crossed this line yet.

famouswaffles 12 hours ago|||

Yeah? Those models are creative.

0xBA5ED 12 hours ago|||

The LLM did not solve the problem.

baxtr 11 hours ago||

Who did then?

dataflow 11 hours ago||

Question for those who believe LLMs aren't intelligent and are merely statistical word predictors: how do you reconcile such achievements with that point of view?

(To be clear: I'm not agreeing or disagreeing. I sometimes feel the same too. I'm just curious how others reconcile these.)

fc417fc802 10 hours ago||

Those things aren't mutually exclusive. They are demonstrably statistical token predictors (go examine an open source implementation) and they clearly exhibit intelligence.

downboots 9 hours ago||

It doesn't matter if you use a car or go there walking. If your goal is cave exploration, the tools are irrelevant.

azan_ 7 hours ago||

But in this specific case AI actually explored the cave for you. Comparing it to car getting you to the cave is really bad comparison.

downboots 4 hours ago||

Whoosh

iwontberude 2 hours ago||

Key quote I went into the article looking for and was not disappointed “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says.

Drupon 9 hours ago||

>ChatGPT, prompted by an amateur, solves an Erdős problem.

There, fixed that for you.

userbinator 14 hours ago||

The LLM took an entirely different route, using a formula that was well known in related parts of math, but which no one had thought to apply to this type of question.

Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.

Also reminds me of the old saying, "a broken clock is right twice a day."

jaggederest 14 hours ago||

    > Every Mathematician Has Only a Few Tricks
    > 
    > A long time ago an older and well-known number theorist made some disparaging remarks about Paul Erdös’s work.
    > You admire Erdös’s contributions to mathematics as much as I do,
    > and I felt annoyed when the older mathematician flatly and definitively stated
    > that all of Erdös’s work could be “reduced” to a few tricks which Erdös repeatedly relied on in his proofs.
    > What the number theorist did not realize is that other mathematicians, even the very best,
    > also rely on a few tricks which they use over and over.
    > Take Hilbert. The second volume of Hilbert’s collected papers contains Hilbert’s papers in invariant theory.
    > I have made a point of reading some of these papers with care.
    > It is sad to note that some of Hilbert’s beautiful results have been completely forgotten.
    > But on reading the proofs of Hilbert’s striking and deep theorems in invariant theory,
    > it was surprising to verify that Hilbert’s proofs relied on the same few tricks.
    > Even Hilbert had only a few tricks!
    > 
    > - Gian-Carlo Rota - "Ten Lessons I Wish I Had Been Taught"

https://www.ams.org/notices/199701/comm-rota.pdf

yayachiken 12 hours ago||

I think when thinking about progress as a society, people need to internalize better that we all without exception are on this world for the first time.

We may have collectively filled libraries full of books, and created yottabytes of digital data, but in the end to create something novel somebody has to read and understand all of this stuff. Obviously this is not possible. Read one book per day from birth to death and you still only get to consume like 80*365=29200 books in the best case, from the millions upon millions of books that have been written.

So these "few tricks" are the accumulation of a lifetime of mathematical training, the culmination of the slice of knowledge that the respective mathematician immersed themselves into. To discover new math and become famous you need both the talent and skill to apply your knowledge in novel ways, but also be lucky that you picked a field of math that has novel things with interesting applications to discover plus you picked up the right tools and right mental model that allows you to discover these things.

This does not go for math only, but also for pretty much all other non-trivial fields. There is a reason why history repeats.

And it's actually a compelling argument why AI is still a big deal even though it's at its core a parrot. It's a parrot yes, but compared to a human, it actually was able to ingest the entirety of human knowledge.

smaudet 11 hours ago||

> it actually was able to ingest the entirety of human knowledge

Even this, though, is not useful, to us.

It remains true that, a life without struggle, and acheivement, is not really worth living...

So, it is nice that there is something that could possibly ingest the whole of human knowledge, but that is still not useful, to us.

People are still making a hullabaloo about "using AI" in companies, and there was some nonsense about there will be only two types of companies, AI ones and defunct ones, but in truth, there will simply be no companies...

Anyways I'm sure I will get down voted by the sightless lemmings on here...

nopinsight 13 hours ago|||

> "a broken clock is right twice a day."

The combinatorial nature of trying things randomly means that it would take millennia or longer for light-speed monkeys typing at a keyboard, or GPUs, to solve such a problem without direction.

By now, people should stop dismissing RL-trained reasoning LLMs as stupid, aimless text predictors or combiners. They wouldn’t say the same thing about high-achieving, but non-creative, college students who can only solve hard conventional problems.

Yes, current LLMs likely still lack some major aspects of intelligence. They probably wouldn’t be able to come up with general relativity on their own with only training data up to 1905.

Neither did the vast majority of physicists back then.

amazingman 12 hours ago||

> Yes, current LLMs likely still lack some major aspects of intelligence.

Indeed, and so do current humans! And just like LLMs, humans are bad at keeping this fact in view.

On a more serious note, we're going to have a hard time until we can psychologically decouple the concepts of intelligence and consciousness. Like, an existentially hard time.

y0eswddl 13 hours ago|||

Yeah, they're great at interpolation - they'll just never be worth much at extrapolation.

SR2Z 13 hours ago|||

Luckily for us, whole fortunes can be made by filling in the blanks between what we know and what we realize.

javawizard 12 hours ago|||

That deserves to be on a plaque somewhere.

I've been using LLMs for much the same purpose: solving problems within my field of expertise where the limiting factor is not intelligence per se, but the ability to connect the right dots from among a vast corpus of knowledge that I would never realistically be able to imbibe and remember over the course of a lifetime.

Once the dots are connected, I can verify the solutions and/or extend them in creative ways with comparatively little effort.

It really is incredible what otherwise intractable problems have become solvable as a result.

dalyons 11 hours ago||

What’s your field

speed_spread 10 hours ago||

Paint by numbers

jedmeyers 12 hours ago|||

And by having more of those blanks filled humans might be able to come up with much better extrapolations than what we have right now.

drdeca 10 hours ago||||

People keep saying this, but the only ways I know of for formalizing this statement, appear to be probably false?

I don’t know what this claim is supposed to mean.

If it isn’t supposed to have a precise technical meaning, why is it using the word “interpolate”?

heresie-dabord 10 hours ago|||

> "a broken clock is right twice a day"

and homo sapiens, glancing at the clock when it happens to be right, may conjure an entire zodiac to explain it.

red75prime 8 hours ago||

And homo sapiens, glancing at a system that gets better and better at solving problems, tries to deny it and comes up with the broken-clock analogy.

nandomrumber 10 hours ago|||

A stopped clock.

A broken clock can be broken in ways which result in it never being correct.

fragmede 4 hours ago||

Those are just analog. If it's a broken digital clock, then all bets are off.

tptacek 13 hours ago|||

Wait, what do you mean "LLMs are still absolutely useless at actual maths computation"? I rely on them constantly for maths (linear algebra, multivariable calc, stat) --- literally thousands of problems run through GPT5 over the last 12 months, and to my recollection zero failures. But maybe you're thinking of something more specific?

schneems 13 hours ago|||

They are bad at math. But they are good at writing code and as an optimization some providers have it secretly write code to answer the problem, run it and give you the answer without telling you what it did in the middle part.

avaer 13 hours ago|||

Someone should tell the mathematicians if they use a calculator or a whiteboard or heavens forbid a computer they are "bad at math".

schneems 1 hour ago|||

1) That's not related to chain of thought I was replying to. Someone asked about the "bad at math" and pointed out "but it seems good to me" so I added the color of why that might be the case. Your retort seems to imply I'm making an argument that because something uses tools for a job it cannot be good at the thing it's using a tool for. Which is not the case.

2) If you have something to say, just say it. Don't put words in my mouth and then argue with a thing I didn't say.

tptacek 12 hours ago||||

What would I do to demonstrate that they are bad at math? If by "maths" we mean things like working out a double integral for a joint probability problem, or anything simpler than that, GPT5 has been flawless.

schneems 1 hour ago||

Search the topic. It is historically documented. It might no longer be true though.

A way to test might be running an open model locally, directly (without a harness) where you could be sure it's not going through a translation layer. I think these days it might have this tool call behavior built in, but I think back in the day it was treated more like a magic trick. Without it, it behaved similar to "how many r's are in strawberry" for simple math.

tempaccount5050 13 hours ago|||

Are they bad at math? Or are they bad at arithmetic?

lacunary 13 hours ago|||

if you don't know much math, it's easy to confuse the two

tptacek 12 hours ago|||

Neither.

jasonfarnon 13 hours ago||||

What tier are you using? I have run lots of problems and am very impressed, but I find stupid errors a lot more frequently than that, e.g., arithmetic errors buried in a derivation or a bad definition, say 1/15 times. I would love to get zero failures out of thousands of (what sounds like college-level math) posed problems.

tptacek 12 hours ago||

I have a standard OpenAI/ChatGPT Pro account; GPT5 is my daily driver for math, and Claude for code.

cuttothechase 12 hours ago||||

calc, stat etc from a text book is something they would naturally be good at but I don't think book based computations thats in the training set and its extrapolations is what is at question here.

They are not great at playing chess as well - computational as well as analytic.

tptacek 12 hours ago||

I think this is wrong and a category error (none of the problems I've given it are in a textbook; they're virtually all randomized), but, try this: just give me a problem to hand off to GPT5, and we'll see how it does.

Further evidence for the faultiness of your claim, if you don't want to take me up on that: I had problems off to GPT5 to check my own answers. None of the dumb mistakes I make or missed opportunities for simplification are in the book, and, again: it's flawless at pointing out those problems, despite being primed with a prompt suggesting I'm pretty sure I have the right answers.

ButlerianJihad 12 hours ago||||

I only have rudimentary understanding of calculus, trigonometry, Google Sheets, and astronomy, but I was able to construct an accurate spreadsheet for astrometry calculations by using Grok and Gemini (both free, no subscription, just my personal account) to surface the formulas for measuring the distance between 2-3 points on the celestial sphere. The LLMs assisted me in also writing functions to convert DMS/HMS coordinates to decimal, and work in radians as well.

I found and fixed bugs I wrote into the formulas and spreadsheets, and the LLMs were not my sole reference, but once the LLM mentioned the names of concepts and functions, I used Wikipedia for the general gist of things, and I appreciated the LLMs' relevant explanations that connected these disciplines together.

I did this on March 14, 2026

Drupon 10 hours ago|||

>I rely on them constantly for maths (linear algebra, multivariable calc, stat)

That's one way to waste a ton of tuition money to just have a clanker do your learning for you.

Unless you're teaching it, in which case I hope your salary is cut by whatever percentage your clanker reduces your workload.

pfdietz 6 hours ago||

Perhaps learning how to get AI to solve your problems is the most important lesson to learn now? The rest seems like the current equivalent of learning cursive.

keyle 13 hours ago|||

The ultimate generalist

karlgkk 14 hours ago||

Also just the sheer value of brute force.

80 hours! 80 hours of just trying shit!

FrasiertheLion 14 hours ago|||

It's 80 minutes, not 80 hours.

jasonfarnon 13 hours ago|||

and you can be sure mathematicians spent way more than 80 hrs on it

ChrisGreenHeur 13 hours ago|||

80 minutes! 80 minutes of just trying shit!

peteforde 13 hours ago||

... shit that solved an apparently significant Erdős problem.

That is not nothing, no matter how much you hate AI.

userbinator 13 hours ago||

It shows that AI is apparently very good at brute-forcing.

TOMDM 12 hours ago|||

Are the human mathematicians who wanted to solve this problem just too stupid to brute force for 80 minutes?

alex_sf 13 hours ago|||

This isn't brute force.

userbinator 11 hours ago||

It is in the same way that educated guessing is.

userbinator 7 hours ago||

Care to actually refute? Interesting that even an LLM would give an attempt at it, but apparently those who only bother to hit the downvote button aren't even meeting that level of "intelligence".

brokencode 13 hours ago||||

How long do you figure it’d take to solve the problem yourself?

echelon 12 hours ago||

Now do P vs NP.

If/when these things solve our hardest problems, that's going to lead to some very uncomfortable conversations and realizations.

ngruhn 7 hours ago||

Nah, people are going to say: It just used these 500 weird tricks from all kinds of different areas. A human could totally have done it. Nobody looked. I guess P/NP wasn't that hard after all.

lucasgerads 11 hours ago||

I feel like a year ago I would have said impossible. Now, I am not so sure anymore. Although, if I wrote the prompt and the correct result would be presented to me I wouldn't even know. Would still need a mathematician to verify it.

wiseowise 10 hours ago|

Wake me up when it creates cancer cure or fusion reactor.

azan_ 7 hours ago|

So you can move the goal post again?

wiseowise 7 hours ago||

It was always the same: increasing human life span, space exploration, solving energy crisis.

More comments...