A recent experience with ChatGPT 5.5 Pro

Posted by _alternator_ 10 hours ago

A recent experience with ChatGPT 5.5 Pro(gowers.wordpress.com)

https://twitter.com/wtgowers/status/2052830948685676605

https://xcancel.com/wtgowers/status/2052830948685676605

376 points | 211 commentspage 3

zingar 3 hours ago|

The post talks about LLM+human contributions being recognized in some different category from human-only. But is it possible to spot the difference between the two?

adammdaw 7 hours ago||

This is certainly interesting, though I would say that based on my understanding of how the current models work combinatorial problems would be an area where they could be particularly successful. They are pretty good at combinatorial creativity - its the exploratory and transformational aspects that are still pretty tricky, and I expect would come to bear in other areas of mathematics.

hodgehog11 5 hours ago|

Indeed, analysis is a bit more loose in its arguments, and so I've found LLMs tend to make more mistakes there.

__rito__ 7 hours ago||

> So maybe there should be a different repository where AI-produced results can live.

Does the author know about CAISc 2026 [0]?

[0]: https://caisc2026.github.io

ionwake 4 hours ago||

one thing I was wondering, is, if LLMs are word completions seemingly coming up with new solutions could this just be because stuff that was kept secret and now - is no longer is due to ingestion? I dont know enough about it tho

dist-epoch 2 hours ago|

why would you keep secret this particular mathematical idea? it's not extraordinarily important, it's not on the path to some other major result, doesn't seem useful in financial trading. even author calls it good reasonable problem for a PhD thesis.

incrediblylarge 7 hours ago||

A month ago my PhD supervisor told me it rips on proofs but he also said it's useless when formalising arguments in Lean - is this still the case?

vjerancrnjak 7 hours ago|

Nope. Codex formalizes much better than any tool with exception of Aristotle from Harmonic.

https://github.com/vjeranc/fixed-rtrt

M3 module was formalized fully purely from experimental data and from a nudge by earlier versions of codex in 15-30 minutes in a simple write/compile/fix-first-error loop. I was a bit surprised how fast it picked up the pattern but given there was a paper from '70s it became clear why later.

casey2 2 hours ago||

I think mathematicians like LLMs because this is the first time we have something like a computer for the kinds of math most people do, high level, hand wavy abstractions that are (relatively) easy for people to grok but hard to explain to traditional computers.

CharlesLau 7 hours ago||

Is the assessment system of undergraduate mathematics education no longer effective?

margalabargala 7 hours ago||

Undergraduate? No. We've had calculators able to solve undergraduate problems for decades. AI doesn't change the need to understand how calculus works any more than calculators did. The foundations remain valuable.

Graduate? Yes.

whatever120 7 hours ago||

How should graduate school be changed then? Specifically for mathematics

dyauspitr 7 hours ago||

90% of the final grade are in room examinations with proctors, maybe two sets of exams of midterms and finals that the vast majority of the final grade comes from. This is already how most of East and South Asia does it anyways and it’s probably the best.

For publications and theses, as long as the final results hold and can be replicated and validated, I don’t see why we shouldn’t allow the wholesale use of LLMs

zozbot234 3 hours ago||

> 90% of the final grade are in room examinations with proctors, maybe two sets of exams of midterms and finals that the vast majority of the final grade comes from.

This is really just a glorified undergraduate education, the real point of graduate school is to learn to do real-world relevant research. For the latter, I think LLM use will be accepted but there will be a heavy expectation on the author of making the result very easily digestable for human mathematicians and linking it thoroughly with the existing literature - something that LLMs are very much not successful at, but a student might be able to do quite well with a mixture of expert guidance and personal effort.

dyauspitr 7 hours ago||

I don’t think it’s just mathematics. We don’t hear enough about this, but if I think back to my undergraduate years, which were less than 10 years ago, every homework assignment and every take-home exam I had would be trivial for LLMs to solve at this point I wonder what is actually happening on the ground.

crocdundae 3 hours ago||

Well... here's something from "boots on the ground": I teach a bachelor's degree where programming is a smallish facet of a curriculum. My course is the last of a series of 3 courses which progressively introduce more concepts and try make practical implementations more feasible. I've been able to grade the course purely based on returns to take-home exercises, some of which are complex, some trivial. When ChatGPT (& Co.) came along I was still able to do that but with a major added workload to me (suddenly everyone started producing mountains of code, often nonsensical, but I still had to read it all). I always requested targeted, atomic changes to code (vs. rewrites) which served me well up to a point (I was still able to grade fairly). I requested them originally to avoid "github copies", but that worked kind of OK with ChatGPT too. However, when ClaudeCode came along it was obvious to me I'm loosing the battle. It does not particularly matter to me whether students use AI or not as long as the rows they add and alter in the assignments make sense, but the "last nail to the coffin" problem now with ClaudeCode is that in the latest batch (this spring) it is clear some students "pay themselves" a good grade (i.e. they pay for ClaudeCode, thus bypassing the need to actually learn). I cannot make assignments that are both complex enough to cause ClaudeCode tripping on something and still humane for those who do not use AI or only use free chatbot options. Essentially ClaudeCode plays havoc with the whole grading process: students not using it (whether they try to write code fully manually or ChatGPT assisted) are left with far less points that students who just push all the code I give to ClaudeCode and "let it rip" for some 15 minutes. This really irks me. So, my solution? Still working on it and hoping to find one! For sure no more points from most take-home assignments: lowest grades still achievable through them (the trivial ones), but that's it, the rest it preparation for an exam. Practically this already means anyone with ChatGPT is going to pass, no doubt about it... As for the higher grades, for autumn I'm desperately now figuring out how to even make a meaningful paper based exam for my course. I've myself completed a master's degree writing C language on paper with a pencil. I sure did not want to start doing that to others, but here we are. Besides, back in my youth the only "library" was pretty much ANSI-parts-of-C! I'm not sure what kind of a 2 inch thick stack of papers I'd have to give my students into the exam these days as reference material. One horrible aspect is that students are now far more dependent on compiler errors to spot pretty much anything and everything... I worry the first paper exam from me will be a total horror story to us all. In any case, interesting times.

zkmon 3 hours ago||

zuogl 5 hours ago||

The HTML generation is surprisingly good because the training corpus for markup is cleaner than most programming languages.

globular-toast 6 hours ago|

I wish people would stop generating stuff they don't understand only to forward it to someone who does. Something about that really rubs me the wrong way.

hodgehog11 5 hours ago||

May I remind you that this is Timothy Gowers. He says he doesn't understand, but he most certainly has far greater capacity than most to detect complete junk from a maybe plausible argument. His colleague is even better able to judge this, hence why he sent it to him.

Also if he did send me complete junk, I would still parse it for multiple days to see what is there.

auggierose 3 hours ago||

Lol. If Gowers sends you a piece of math he doesn't quite understand because he thinks that you might, that is something you celebrate.

More comments...