Posted by tedsanders 6 hours ago
edit: apparently that’s only the _condensed summary_ of the chain of thought.
Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
To be blunt, this seems incredibly uninteresting to me. I enjoy learning mathematics, sure, but I just don't find much inherent meaning in reading a textbook or a paper. The meaning comes from the taking those ideas and applying them to my own problems, be it a direct proof of a conjecture or coming up with the right framework or tools for those conjectures. But, of course, in this future, those proofs and frameworks are already in the textbook. So what's the point? If someone cared about these answers in the first place, they probably could have found the right prompt to extract it from this phantom textbook anyways.
You could argue for there being work still like marginal improvements and applying the returned proof to other scenarios as happened in this case, but as above, what is really there to do if this is already in the phantom textbook somewhere and you just need to prompt better? The mathematicians in this case added to the exposition of the proof, but why wouldn't the phantom textbook already have good enough exposition in the first place?
I think my complete dismissal of the value of things like extending the proofs from an LLM or improving exposition is too strong -- there is value in both of them, and likely will always be -- but it would still represent a sharp change in what a mathematician does that I don't think I am excited for. I also don't think this phantom textbook is contained even in the weights of whatever internal model was used here just yet (especially since as some of the mathematicians in the article pointed out, a disproof here did not need to build any new grand theories), but it really does seem to me it eventually will be, and I can't help but find the crawl towards that point somewhat discouraging.
Who cares if it is God's book or the machine's Xeroxed copy?
Along with all the rest of what humans find meaningful and fulfilling.
Not so many years from now, some of them will surpass you. A few years after that all (that survive to that point) will surpass you.
Does that terrify you just as much?
Perhaps your name-calling is not actually as logically grounded as you think. It definitely seems to depend on unfounded leaps.
This technology is solving interesting math/physics problems for us, which is completely different.
woah.
Gowers has one of my favourite video series about how he approaches a problem he is unfamiliar with: https://www.youtube.com/watch?v=byjhpzEoXFs
It is disheartening to see him jump into this GenAI puffery.
I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.
My guess is Gowers wants in on that action, or his University does.
Either way, it makes me sad. If its self motivated... even sadder.
His university is deeply entrenched with the GenAI org that released this result both with having alumni on staff, integrating their tools into the school's processes and curriculum, and paying for lots of grants. (I understand Tao is absent from this specific announcement, perhaps because it found its solution without utilizing formal verification tooling)
Is it unreasonable to assume he's feeling pressure to do so?
Gowers similarly appeared largely uninterested in this current crop of GenAI until some months ago when he announced a 9M$ fund to develop "AI for Maths" and since then his social media has included GenAI promotion.
Now he is being asked about this result and his first sentence is:
> I do not have the background in algebraic number theory to make a detailed assessment of the disproof of Erdős’s unit-distance conjecture, so instead I shall make some tentative comments about what it tells us about the current capabilities of AI.
Why did this GenAI org reach out to mathematicians outside of the discipline that this result addresses?
Why did they respond?!
As with Tao, he's always been a measured optimist even before the tools were consistently usable for his work. And even still nowadays, he adds stipulations to his statements on the successes of AI. Yes, he's part of Math Inc. now and is in close contact with Google Deepmind for some projects but his interest lies in using the tools today. Gowers has been hypothesizing on the future of math in the tone he has taken now ever since o3/GPT5. There's no comparison between the two who should attract more scrutiny.
Focusing solely on "capabilities" is the irrational thinking.
Asbestos is the most "capable" material where extreme thermal, chemical and electrical resistance is required.
> has a motivation to "market" the accomplishment as much as possible
I am so sick of HN promoting unethical behaviour as virtuous due to it's financialization worship at the foot of "valuations".
> but surely you agree it IS a remarkable achievement?
If you could define the bounds of "remarkable" I could answer this question.
A lot of the weight this holds is the fact that it's an old problem and that its difficulty hinges on the lack of investigation the disproof side of hypothesis. The model basically took a contrarian path and found tools and methods that support that a disproof is viable. So the (unquantified amount of) mathematicians out there were all dedicating their resources on the notion that this can be proved. Some with hindsight would say that if they a had team of experts who are driven to the goal of disproof that this would have been achievable by humans, and one of the mathematicians of the paper state as much,this still has value in terms of reliability measurement, and possibly human-aided endeavors when the methods scrounged by the model can be used in other solutions.
When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.
This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.
I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.
It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.
I do not believe it will replace humans.
Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
Goodness gracious!
(That's the first time I used that expression on HN.)
But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.
What strikes me as unusual though is that they do make a point of saying things like "this is a general purpose model that wasn't trained on the problem" among a few other things as if that's new. The last bountied problem they accomplished used a public model that ALSO didn't rely on specialized training. And that didn't make their blog.
And so do humans. Gotta stand on these shoulders of giants.
But AI is supercharging Math like there is no tomorrow.
LLM's are doomed to fail. By design. You can't fix them. It's how do they work.
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
No matter how much compute time it's given to combine training samples with each other and run through a validation engine it will still be missing some chunk of the "long tail". To make progress in the long tail it would need to have understanding, and not just a mimicry of understanding. Unless that happens they will always be dependent on the humans that they are mimicking in order to improve.
I feel like people grasping straws on the shrinking limitations of AI systems are just copying the "god of the gaps" fallacy
The thing where you can understand the meaning of this sentence without first compiling a statistical representation of a 10 trillion line corpus of training data.
Unless you're an NPC of course.
Or rather, maybe I don't understand what you mean :)
One qualitative distinction that remains for the time being is that humans care about things while AIs do not. Human drive and motivation is needed to have AI perform tasks.
Of course, this distinction isn’t set in stone.