An OpenAI model has disproved a central conjecture in discrete geometry

Posted by tedsanders 6 hours ago

An OpenAI model has disproved a central conjecture in discrete geometry(openai.com)

656 points | 464 commentspage 3

CGMthrowaway 4 hours ago|

How do you even get an LLM to try to solve one of these problems? When I ask it just comes back with the name of the problem and saying "it can't be done"

lovecg 3 hours ago||

By making it think for 100+ pages https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d... Regular ChatGPT users don’t have a way to do that, this is something they do internally only.

edit: apparently that’s only the _condensed summary_ of the chain of thought.

woah 2 hours ago|||

you can do this easily with the api or with codex

KalMann 3 hours ago||

Maybe you need to phrase it better. Like with a more specific direction of thinking.

trostaft 4 hours ago||

Speaking as a postdoc in math, I must say that this is rather exciting. This is outside of my field, but the companion remarks document is quite digestible. It appears as though the proof here fairly inspired by results in literature, but the tweaks are non-trivial. Or, at least to me, they appear to be substantial to where I would consider the entire publication novel and exciting.

Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .

Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.

One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.

isotypic 1 hour ago||

I cannot quite share your enthusiasm. The clearest analogy that I can think of to try to explain why I feel this way is that it seems there will eventually be a phantom textbook of all of mathematics contained in the weights of an LLM; every definition, every proof, etc; and the role of a mathematician is going to be reduced towards reading certain parts of this phantom textbook (read: prompting an LLM to generate a proof or explore some problem) and sharing the resulting text with others, which of course anybody else could have found if they simply also knew the right point of the textbook.

To be blunt, this seems incredibly uninteresting to me. I enjoy learning mathematics, sure, but I just don't find much inherent meaning in reading a textbook or a paper. The meaning comes from the taking those ideas and applying them to my own problems, be it a direct proof of a conjecture or coming up with the right framework or tools for those conjectures. But, of course, in this future, those proofs and frameworks are already in the textbook. So what's the point? If someone cared about these answers in the first place, they probably could have found the right prompt to extract it from this phantom textbook anyways.

You could argue for there being work still like marginal improvements and applying the returned proof to other scenarios as happened in this case, but as above, what is really there to do if this is already in the phantom textbook somewhere and you just need to prompt better? The mathematicians in this case added to the exposition of the proof, but why wouldn't the phantom textbook already have good enough exposition in the first place?

I think my complete dismissal of the value of things like extending the proofs from an LLM or improving exposition is too strong -- there is value in both of them, and likely will always be -- but it would still represent a sharp change in what a mathematician does that I don't think I am excited for. I also don't think this phantom textbook is contained even in the weights of whatever internal model was used here just yet (especially since as some of the mathematicians in the article pointed out, a disproof here did not need to build any new grand theories), but it really does seem to me it eventually will be, and I can't help but find the crawl towards that point somewhat discouraging.

ted_dunning 8 minutes ago|||

In Erdös idiosyncratic nomenclature, all the best proofs are "in the book" and it was always a joyful thing to not only find a proof, but to find the proof that is in the book.

Who cares if it is God's book or the machine's Xeroxed copy?

k_roy 8 minutes ago|||

And you just expressed the thoughts of every engineer that writes code for a living who is either left behind, or embracing the technology to hit KPIs and QVRs.

umanwizard 2 hours ago||

Why would it excite you, rather than terrifying you? The better LLMs get at math, the closer the expertise you spent your whole life building is to being worthless.

Along with all the rest of what humans find meaningful and fulfilling.

cman1444 33 minutes ago|||

Because for many people who pursue these fundamental truths, the reward is not necessarily personal fame, fortune, or even personal understanding. Advancing humanity's total knowledge (even if that knowledge is by proxy through AI) is reward enough.

krackers 22 minutes ago||||

If one only found meaning in life through external factors like work (no matter how "intellectually rewarding") then it seems like a life destined for eventual disappointment.

ted_dunning 6 minutes ago||||

Does it terrify you to look at children?

Not so many years from now, some of them will surpass you. A few years after that all (that survive to that point) will surpass you.

Does that terrify you just as much?

CamperBob2 2 hours ago|||

What's happening is the verbal/linguistic equivalent of the invention of calculus. No intellectual field will ever be the same again. Who wouldn't find that exciting, and want to experience it?

rogerrogerr 2 hours ago|||

People who enjoy thinking. Ya know, the "intellectual" part.

mlcrypto 31 minutes ago|||

The so called "progressives" prove that they were the same ones crying after the printing press, automobile, calculator, washing machine, etc

ted_dunning 4 minutes ago||

You made up a group in the past and you made up things they say and then draw the inference that a different group in the present is somehow morally disadvantaged by obvious inference.

Perhaps your name-calling is not actually as logically grounded as you think. It definitely seems to depend on unfounded leaps.

aroman 2 hours ago|||

This is the beginning of thinking, not the end...

umanwizard 1 hour ago|||

I'm not sure I grasp the analogy to the invention of calculus. Calculus helped us solve new and interesting math/physics problems. Repeated for emphasis: helped *us* solve.

This technology is solving interesting math/physics problems for us, which is completely different.

ks2048 4 hours ago||

Timothy Gowers' tweet about this: "If you are a mathematician, then you may want to make sure you are sitting down before reading futher.".

woah.

missyougowers 3 hours ago|

Unfortunately Gowers has taken Tao's lead on this one.

Gowers has one of my favourite video series about how he approaches a problem he is unfamiliar with: https://www.youtube.com/watch?v=byjhpzEoXFs

It is disheartening to see him jump into this GenAI puffery.

I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.

My guess is Gowers wants in on that action, or his University does.

Either way, it makes me sad. If its self motivated... even sadder.

horhay 1 hour ago|||

I'm not sure your characterization of Tao is accurate lol. In that companion paper, only Gowers seems to extensively show no pragmatism in the implications of this accomplishment. Even the younger math experts in that paper were a lot more cautious with their statements. Tao seems to follow that same tune most of the time even though he uses AI for first-pass inspections of solutions brought to his attention.

missyougowers 49 minutes ago||

Tao was absent from the formal verification circles until GenAI orgs saw formal verification as a way to legitimize their obscene existence, and since has been making the rounds on the podcast bro circuit pumping up these GenAI orgs.

His university is deeply entrenched with the GenAI org that released this result both with having alumni on staff, integrating their tools into the school's processes and curriculum, and paying for lots of grants. (I understand Tao is absent from this specific announcement, perhaps because it found its solution without utilizing formal verification tooling)

Is it unreasonable to assume he's feeling pressure to do so?

Gowers similarly appeared largely uninterested in this current crop of GenAI until some months ago when he announced a 9M$ fund to develop "AI for Maths" and since then his social media has included GenAI promotion.

Now he is being asked about this result and his first sentence is:

> I do not have the background in algebraic number theory to make a detailed assessment of the disproof of Erdős’s unit-distance conjecture, so instead I shall make some tentative comments about what it tells us about the current capabilities of AI.

Why did this GenAI org reach out to mathematicians outside of the discipline that this result addresses?

Why did they respond?!

horhay 32 minutes ago||

I think the intention of this paper is to build some type of culture of "math generalists" that don't quite exist in today's academia. The thing is, is that a good half of the people in that paper were actually very pragmatic on the implications of such a success and present questions in terms of the measurability of the difficulty of the problem and the generalizability of the solution provided for other questions. Gowers in particular offers no resistance and in fact resorts to the theatrics of "being the bearer of bad news" on Twitter for some reason.

As with Tao, he's always been a measured optimist even before the tools were consistently usable for his work. And even still nowadays, he adds stipulations to his statements on the successes of AI. Yes, he's part of Math Inc. now and is in close contact with Google Deepmind for some projects but his interest lies in using the tools today. Gowers has been hypothesizing on the future of math in the tone he has taken now ever since o3/GPT5. There's no comparison between the two who should attract more scrutiny.

cm2012 1 hour ago||||

If seems like you have an axe to grind about AI capabilities that is making you think irrationally

missyougowers 45 minutes ago||

This is a popular HNism.

Focusing solely on "capabilities" is the irrational thinking.

Asbestos is the most "capable" material where extreme thermal, chemical and electrical resistance is required.

Lost-Futures 51 minutes ago||||

Ngl, this sounds like a defensive coping mechanism

aroman 2 hours ago|||

Are you saying this result is uninteresting and therefore AI slop or puffery? Obviously OpenAI has a motivation to "market" the accomplishment as much as possible, but surely you agree it IS a remarkable achievement?

missyougowers 39 minutes ago||

I'll let the mathematicians in the field determine the level of "interest" in this result, but saying "you may want to make sure you are sitting down" is pure puffery.

> has a motivation to "market" the accomplishment as much as possible

I am so sick of HN promoting unethical behaviour as virtuous due to it's financialization worship at the foot of "valuations".

> but surely you agree it IS a remarkable achievement?

If you could define the bounds of "remarkable" I could answer this question.

horhay 25 minutes ago||

It's remarkable, its not out of the bounds of the pattern of success that AI has had with math recently to the point that people should sound alarm bells.

A lot of the weight this holds is the fact that it's an old problem and that its difficulty hinges on the lack of investigation the disproof side of hypothesis. The model basically took a contrarian path and found tools and methods that support that a disproof is viable. So the (unquantified amount of) mathematicians out there were all dedicating their resources on the notion that this can be proved. Some with hindsight would say that if they a had team of experts who are driven to the goal of disproof that this would have been achievable by humans, and one of the mathematicians of the paper state as much,this still has value in terms of reliability measurement, and possibly human-aided endeavors when the methods scrounged by the model can be used in other solutions.

alansaber 5 hours ago||

AI isn't going to supercharge science but I wouldn't be as dismissive as other posters here.

tombert 5 hours ago||

I'm not a scientist but I like to LARP as one in my free time, and I have found ChatGPT/Claude extremely useful for research, and I'd go as far as to say it supercharged it for me.

When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.

This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.

I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.

kingkongjaffa 5 hours ago||

One heuristic I used during my masters degree research thesis was to look for the seminal people or papers in a field by using google scholar to find the most cited research papers and then reading everything else by that author / looking at the paper's references for others. You often only need to go back 3-4 papers to find some really seminal/foundational stuff.

tombert 5 hours ago||

Yeah, that's actually how I discovered Leslie Lamport like ten years ago. I was looking for papers on distributed consensus, and it's hard not to come across Paxos when doing that. It turns out that he has oodles of really great papers across a lot of different cool things in computer science and I feel like I understand a lot more about this space because of it.

It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.

vatsachak 5 hours ago|||

I absolutely believe that AI will supercharge science.

I do not believe it will replace humans.

unsupp0rted 5 hours ago|||

I absolutely believe that AI will supercharge science and replace humans.

Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together

lovecg 3 hours ago|||

I’d give humans some credit, they’re an adaptable bunch. AI won’t replace humans in the same way humans did not replace cockroaches. It’s a non-sequitur.

bsza 1 hour ago||

We generally don’t allow cockroaches to thrive in the spaces we claim for ourselves. Question is how much space (economic or otherwise) will AI claim for itself and whether there will be any left for us.

geraneum 4 hours ago||||

> Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together

Goodness gracious!

vatsachak 4 hours ago||||

Well, for starters AI doesn't have goals. If there was a super intelligence with goals, why would they work for us?

devttyeu 4 hours ago||

Fwiw if you trained an LLM in an RL sandbox that would require it to have goals, the output llm probably would "have goals"

stonogo 5 hours ago||||

Not like large language models, which only required tens of megawatts of power and use highly efficient monte carlo methods, eh

TheOtherHobbes 4 hours ago||

Individual humans are processing nodes on human culture as a whole, which runs on rather more than tens of megawatts.

unsupp0rted 4 hours ago||

Also it costs a lot to train and run individual humans, and they can only be run for brief periods per day before they crash, hallucinate and possibly get irretrievably broken.

seydor 5 hours ago|||

replace, no. obsolete, yes

dvfjsdhgfv 5 hours ago||

lol

(That's the first time I used that expression on HN.)

comboy 5 hours ago|||

Not only it supercharged science it supercharges scientist. Research on any narrow topic is a different world now. Agents can read 50 papers for you and tell you what's where. This was impossible with pure text search. Looking up non-trivial stuff and having complex things explained to you is also amazing. I mean they don't even have to be complex, but can be for adjacent field where these are basics from the other field but happen to be useful in yours. The list goes on. It's a hammer you need to watch your fingers, it's not good at cutting wood, but it's definitely worth having.

dvfjsdhgfv 5 hours ago||

It's a very heavy hammer. I used it in the way you describe and after double-checking noticed some crucial details were missed and certain facts were subtly misrepresented.

But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.

Karrot_Kream 4 hours ago||

I don't think there's a substitute for reading the source material. You have to read the actual paper that's cited. You have to read the code that's being sourced/generated. But used as a reasoning search engine, it's a huge enabler. I mean so much of research literally is reasoning through piles of existing research. There's probably a large amount of good research (especially the kind that don't easily get grant funding) that can "easily" shake out through existing literature that humans just haven't been able to synthesize correctly.

horhay 1 hour ago|||

It's a very complicated matter honestly. This is a new height that AI has reached, even though it follows the usual methods of success that it has had.

What strikes me as unusual though is that they do make a point of saying things like "this is a general purpose model that wasn't trained on the problem" among a few other things as if that's new. The last bountied problem they accomplished used a public model that ALSO didn't rely on specialized training. And that didn't make their blog.

OldGreenYodaGPT 5 hours ago|||

Isn’t that a joke? It already has supercharged science

ks2048 5 hours ago|||

Since "supercharged science" is as ill-defined as AGI, ASI, etc., people will be able to debate it endlessly for no reason.

datsci_est_2015 5 hours ago|||

Where are the second order effects of this supercharging of science? Or has it not been enough time for those to propagate?

renegade-otter 5 hours ago|||

It will notice things that humans may have missed. That said - it can only work off the body of work SOMEONE did in the past.

throw-the-towel 5 hours ago|||

> it can only work off the body of work SOMEONE did in the past.

And so do humans. Gotta stand on these shoulders of giants.

bel8 5 hours ago|||

Can't the previous body of work be from AI too?

renegade-otter 4 hours ago||

Of course it can be, but it's overeager. No matter what your context window is, we will use AI collectively to flood the zone with shit.

karmasimida 5 hours ago||

To be strict, Math is not Science.

But AI is supercharging Math like there is no tomorrow.

anthk 3 hours ago||

LLM's? I doubt it. Systems with Prolog, Common Lisp and the like with proof solvers? For sure.

LLM's are doomed to fail. By design. You can't fix them. It's how do they work.

karmasimida 2 hours ago||

You can have a word with Terrence Tao, he had different opinions here

agentultra 3 hours ago||

I’m curious about the “autonomous” claim. Usually these systems require a human to guide and verify steps, clarify problems, etc. are they claiming that the reinforcement model wasn’t given any inputs, tools, guidance, or training data from humans?

overgard 2 hours ago||

I think it's worth being skeptical of this.. there's a way too common pattern of "AI Lab Shows AI Doing Something Only Humans Can Do" only for a bunch of important caveats and limitations to be discovered after the initial hype. And of course, the correction never seems to be as viral as the hype. I'll believe it when a mathematician actually reads the 100+ pages of reasoning.

adt 2 hours ago||

https://lifearchitect.ai/asi/

sinuhe69 2 hours ago||

How did they jump from finding counter-examples (disproof) to a proof?

atleastoptimal 4 hours ago|

To all AI skeptics:

What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?

If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?

davebren 2 hours ago||

> What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?

No matter how much compute time it's given to combine training samples with each other and run through a validation engine it will still be missing some chunk of the "long tail". To make progress in the long tail it would need to have understanding, and not just a mimicry of understanding. Unless that happens they will always be dependent on the humans that they are mimicking in order to improve.

atleastoptimal 2 hours ago|||

What is the difference between what LLM's do and "true" understanding?

I feel like people grasping straws on the shrinking limitations of AI systems are just copying the "god of the gaps" fallacy

davebren 2 hours ago||

> What is the difference between what LLM's do and "true" understanding?

The thing where you can understand the meaning of this sentence without first compiling a statistical representation of a 10 trillion line corpus of training data.

Unless you're an NPC of course.

smashers1114 2 hours ago||

I mean brains get a lot of training data too in order to understand language. I don't think you provided a relevant difference.

Or rather, maybe I don't understand what you mean :)

davebren 2 hours ago||

When you think about the word apple and what it signifies, what do you experience? Is there a feeling of "appleness"? Do you think that sense of meaning is equivalent to the numerical weights of an LLM?

enoint 4 hours ago|||

That’s one possibility. If it fails to convince a critical mass that it’s a net improvement in their lives, then the impediment to continual improvement will be sabotage.

KalMann 3 hours ago|||

I think there's been natural but steady progress with since 2024 with the release of the o1 model, which showed impressive reasoning capabilities. But I think it's wrong to look at the magnitude of the accomplishments and assume that will be field independent. We don't know the range of problems reasoning techniques are useful for. What we see here is refinement of capabilities that have been noticeable for years.

layer8 2 hours ago|||

> everything we care about

One qualitative distinction that remains for the time being is that humans care about things while AIs do not. Human drive and motivation is needed to have AI perform tasks.

Of course, this distinction isn’t set in stone.

rzmmm 3 hours ago|||

Maybe after decades. 2022 models were microscopic compared to latest models.

gowld 1 hour ago|||

It depends on if AI can invent cold fusion before running our of all the energy on Earth.

xandrius 4 hours ago||

You should really look up a video about what GPTs fundamentally are.

Rover222 4 hours ago||

You should also really look up a video about what neural synapses really are.

More comments...