Top
Best
New

Posted by _alternator_ 8 hours ago

A recent experience with ChatGPT 5.5 Pro(gowers.wordpress.com)
https://twitter.com/wtgowers/status/2052830948685676605

https://xcancel.com/wtgowers/status/2052830948685676605

330 points | 184 comments
ziotom78 4 hours ago|
I am a physics professor and often use Gemini to check my papers. It is a formidable tool: it was able to find a clerical error (a missing imaginary unit in a complex mathematical expression) I was not able to find for days, and it often underlines connections between concepts and ideas that I overlooked.

However, it often makes conceptual errors that I can spot only because I have good knowledge of the topic I am discussing. For instance, in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.

Good to know that ChatGPT 5.5 Pro can produce a publishable paper, but from what I have seen so far with Gemini, it seems to me that it is better to consider LLMs as very efficient students who can read papers and books in no time but still need a lot of mentoring.

nopinsight 3 hours ago||
I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes.

Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon.

You might want to track the progress of these models on the CritPt benchmark, which is built on *unpublished, research-level* physics problems:

https://critpt.com/

Frontier models are still nowhere near solving it, but progress has been rapid.

* o3 (high) <1.5 years ago was at 1.4%

* GPT 5.4 (xhigh), 23.4%

* GPT-5.5 (xhigh), 27.1%

* GPT-5.5 Pro (xhigh) 30.6%.

https://artificialanalysis.ai/evaluations/critpt.

FrojoS 1 hour ago|||
> there's no reason to believe the progress of LLMs [...] will stop anytime soon

Wrong. Every advancement has followed a s curve. Where we are on that curve is anyones guess. Or maybe "this time its different".

aurareturn 1 hour ago||
He said "will stop anytime soon". He didn't say forever.
Lionga 1 hour ago||
Which still makes no sense. There is the same chance we are flatlining now as that we are flatlining in e.g. 3 years or 5 years.
squidbeak 20 minutes ago||
In what sense are the models flatlining?
civvv 2 hours ago|||
There are many indications that model progress is slowing down, so that is not entirely accurate.
StrauXX 1 hour ago||
Which indications are that?
lionkor 30 minutes ago||||
Nobody is releasing NEW models
overfeed 1 hour ago|||
Investment dollars.
dzhiurgis 14 minutes ago||
Source for that claim?
illiac786 2 hours ago|||
Using the word “Mentoring” is anthropomorphic and subconsciously makes you think it will learn. It does not, and it is for the human brain a formidable task to remember that something as smart as an LLM does not learn. I keep catching myself making the same mistake.

It’s also because it is so annoying to have to manage the memory of the LLM with custom prompts/instructions manually.

I have not yet played with the long term memory feature, but I fear it will be even less reliable than prompts, simply because in one year or two years so much will have changed again that this “memory” will have to be redone multiple times by then.

kybernetikos 19 minutes ago|||
Current LLM architecture doesn't learn - and you're right this is a huge piece that normal folks fail to understand, since in many ways, it's the opposite of what years of AI research has been trying to create.

However, I think it's important to remember that LLMs are embedded in larger systems, and those larger systems do learn.

timschmidt 2 hours ago||||
They can form new associations between concepts via their input prompts and thinking text. That is a form of learning. Just not very durable. I liken it to https://en.wikipedia.org/wiki/Anterograde_amnesia
illiac786 2 hours ago||
yeah, I should have been more specific: I meant the type of learning that mentoring fosters, the long term learning.
timschmidt 2 hours ago||
I hear you. I think we are already seeing some middle ground with agentic systems using RAG, skills.md files, etc. It's a sort of disassociated card catalog memory. An engineer's notebook. Not the integrated, correlated, pre-processed set of relationships in the model. How to go backward from the notebook -> model cheaply without tanking performance is definitely one of those billion dollar questions.
freedomben 1 hour ago|||
I mostly agree, though after a mentoring session you can ask it to write skill or a memory and it can be reasonably durable. For Claude at least, the memories work pretty well (though I am still at a small scale with them. As they grow it might start to break somewhat. Doesn't always work, but has often enough that I thought it worth a mention.
maximamas 3 hours ago|||
LLMs are at their best when you have an expectation for their output. I generally know the shape of the correct response and that allows me to evaluate it's output on it's "vibes", rather than line by line. If there's no expectation then I have to take everything at face value and now I'm at the mercy of the machine.
jillesvangurp 2 hours ago|||
Exactly, if I generate a large chunk software, I'm going to have expectations about what it will do, how it will do it, etc. You don't just accept the statement that "it's done" for fact but you start looking for evidence.

A scientific approach here is to look to falsify the statement. You start asking questions, running tests, experiments, etc. to prove the notion that it is done wrong. And at some point you run out of such tests and it's probably done for some useful notion of done-ness.

I've built some larger components and things with AI. It's never a one shot kind of deal. But the good news is that you can use more AI to do a lot of the evaluation work. And if you align your agents right, the process kind of runs itself, almost. Mostly I just nudge it along. "Did you think about X? What about Y? Let's test Z"

noisy_boy 42 minutes ago||
> Mostly I just nudge it along. "Did you think about X? What about Y? Let's test Z"

Exactly - you need to constantly have your sceptics glasses on and you need to be exacting in terms of the structure you want things to follow. Having and enforcing "taste" is important and you need to be willing to spend time on that phase because the quality of the payoff entirely depends on it.

I recently planned for a major refactor. The discussion with claude went on for almost two days. The actual implementation was done in 10 minutes. It probably has made some mistakes that I will have to check for during the review but given that the level of detail that plan document had, it is certainly 90-95% there. After pouring-in of that much opinion, it is a fairly good representation of what I would have written while still being faster than me doing everything by hand.

ziotom78 2 hours ago|||
I agree, but I would add that they can be very useful even if you do not have clear expectations but have some solid ways to verify their claims. Often in doing this verification I came up with new ideas.
tags2k 3 hours ago|||
I'm no physics professor but this aligns with the way I use the tools in my "senior engineer" space. I bring the fundamentals to sanity-check the trigger-happy agent and try to imbue other humans with those fundamentals so they can move towards doing the same. It feels like the only way this whole thing will work (besides eventually moving to local models that do less but companies can afford).
_the_inflator 1 hour ago|||
I agree and put it this way: LLMs sound so convincing presenting you the work it does rose colored and promising to give you more if you keep going.

There is a 50/50 chance that it turns out to be right or letting you jump of the cliff.

Only the trip stays the same beautiful 5 star plus travel.

Also, spotting an error and telling LLM makes it in most cases worse, because the LLM wants to please you and goes on to apologize and change course.

The moment I find myself in such a situation I save or cancel the session and start from scratch in most cases or pivot with drastic measures.

Gemini to me is the most unpredictable LLM while GPT works best overall for me.

Gemini lately gave me two different answers to the same question. This was an intentional test because I was bored and wanted to see what happens if you simply open a new chat and paste the same prompt everything else being the same.

Reasoning doesn’t help much in the Coding domain for me because it is very high level and formally right what the LLM comes up with as an explanation.

I google more due to LLMs than before, because essentially what I witnessed is someone producing something that I gotta control first before I hit the button that it comes with. However, you only find out shortly afterwards whether the polished button started working or gave you a warm welcome to hell.

MattPalmer1086 42 minutes ago|||
Reusing the same prompt several times is something I've started doing too. The contrast is often illuminating.

In one case, it made a thoroughly convincing argument that an approach was justified. The second time it made exactly the opposite argument, which was equally compelling.

I now see LLMs as persuasion machines.

pbhjpbhj 55 minutes ago||||
>LLM wants to please you

I was using Copilot and asked it a question about a PDF file (a concept search). It turned out the file was images of text. I was anticipating that and had the text ready to paste in.

Instead, it started writing an OCR program in python.

I stopped it after several minutes.

Often Copilot says it can't do something (sometimes it's even correct), that's preferential to the try-hard behaviour here.

freedomben 1 hour ago|||
> Gemini to me is the most unpredictable LLM while GPT works best overall for me.

This nails an important thing IMHO. I've absolutely noticed this, for better or worse. Gemini can produce surprisingly excellent things, but it's unpredictability make me go for GPT when I only want to ask it once.

wccrawford 24 minutes ago|||
This doesn't surprise me since the coding agents are similar. I've previously compared them to very fast, ambitious junior programmers. I think they are probably mid-level coders now, but they continue to make mistakes that a senior programmer wouldn't. Or at least shouldn't.
Quothling 1 hour ago|||
We've got a rather extensive AI setup through our equity fund and I've setup a group of agents for data architecture at scale. One is the main agent I discuss with and it's setup to know our infrastructure and has access to image generation tools, websearch, hand off agents and other things. I tend to use Opus (4-6 currently) and I find it to be rather great. As you point out it comes with the danger of making mistakes, and again, as you point out, it's not an issue for things I'm an expert on. What I rely on it for, however, is analysing how specific tools would fit into our architecture. In the past you would likely have hired a group of consultants to do this research, but now you can have an AI agent tell you what the advantages and disadvantages of Microsoft Fabric in your setup. Since I don't know the capabilities of Fabric I can't tell if the AI gives me the correct analysis of a Lakehouse and a Warehouse (fabric tools).

What I do to mitigate this is that I have fact checking agents configured to be extremely critical and non-biased on Opus, Gemini and GPT. Which are then handed the entire conversation to review it. Then it's handed off to a Opus agent which is setup to assume everything is wrong. After this, and if I'm convinced something is correct I'll hand the entire thing off to a sonnet agent, which is setup to go through the source material and give me a compiled list of exactly what I'll need to verify.

It's ridicilously effective, but I do wonder how it would work with someone who couldn't challenge to analytic agent on domain knowledge it gets wrong. Because despite knowing our architecture and needs, it'll often make conceptional errors in the "science" (I'm not sure what the English word for this is) of data architecture. Each iteration gets better though, and with the image generation tools, "drawing" the architecture for presentations from c-level to nerds is ridiclously easy.

port11 49 minutes ago|||
Gemini’s smug and over-confident “this is the gold standard in 2026” definitely leaves little space for nuance if you don’t know the subject matter. Human students would, hopefully, know they don’t know everything.
quantummagic 37 minutes ago||
> Gemini’s smug...

Anthropomorphizing these systems is dangerous, whether coming from the bullish or bearish perspective. The output is statistically generated by a machine lacking the capability to be smug.

Jtarii 31 minutes ago||
>Anthropomorphizing these systems is dangerous

That ship has sailed. Humans will anthropomorphize a rock if you put googly eyes on it.

bartvk 25 minutes ago||
First I thought to myself, "my daughter does this and it looks so cute". And only as a second thought, that your comment just proved itself.
mixtureoftakes 3 hours ago|||
please, sign up for a paid plan of either chatgpt or claude. gemini is while close, still noticeably behind

you deserve opinions shaped by interactions with the best tools that are out there.

wg0 3 hours ago|||
Gemini feels deep and philosophical. Especially for product management. Tell him you're a product manager and we're a team of two.

But regular reminder - All LLMs can be wrong all the time. I only work with LLMs in domains I'm expert in OR I have other sources to verify their output with utmost certainty.

wafflemaker 1 hour ago|||
Or when you don't care about results being very correct.

When I'm cooking meatballs with sauce and the recipe calls for frying them, I'll have an LLM guestimate how long and which program to use in an air fryer to mimic the frying pan, based on a picture of balls in a Pyrex. So I can just move on with the sauce, instead of spending time browsing websites and stressing about getting it perfect.

I used to hate these non-deterministic instructions, now I treat it as their own game. When I will publish my first recipe, I'll have an LLM randomize the ingredient amounts, round them up to some imprecise units and also randomize the times. Psychologists say we artists need to participate and I WILL participate.

smartmic 1 hour ago|||
> I only work with LLMs in domains I'm expert in

This. Should become a general rule for any non-trivial use of LLM in a professionel setting.

cubefox 3 hours ago||||
Gemini is certainly not behind Claude in terms of physics.
hodgehog11 2 hours ago||||
ChatGPT and Gemini are actually fairly comparable.

Claude has been utterly useless with most math problems in my experience because, much like less capable students, it tends to get overly bogged down in tedious details before it gets to the big picture. That's great for programming, not so much for frontier math. If you're giving it little lemmas, then sure it's great, but otherwise you're just burning tokens.

peyton 3 hours ago|||
Seriously, it’s not worth reaching for less intelligence. Use Extended Pro 100% of the time for things you’d spend the amount of time GP spent writing their post.
wood_spirit 3 hours ago|||
Chiming in to agree but clarify that the latest sota models are no better than Gemini.

I put my stuff through several sota models and round robin them in adversarial collaboration and they are all useful even though, fundamentally, they don’t “understand” anything. But they are super useful delegates as long as deciding on the problem and approach and solution all sits safely in your head so you can challenge them and steer them.

So I know the article is about one particular new model acing something and each vendor wants these stories to position their model as now good enough to replace humans and all other models, but working somewhere where I am lucky enough to be able to use all the sota models all the time, I can say that all keep making obvious mistakes and using all adversarially is way better than trusting just one.

I look forward to the day one a small open model that we can run ourselves outperforms the sum of all today’s models. That’s when enough is enough and we can let things plateau.

energy123 1 hour ago||
Basically all Erdos problems that get solved with AI use ChatGPT 5.* Pro, not Gemini/Opus.
5555watch 35 minutes ago||
I would guess it's because ChatGPT Pro allows for 80min "think". I've never had even remotely similar think times with Gemini Deep Think. It's generally around 10-15min for math problems, and get increasingly shorter for continued interactions.
tasuki 2 hours ago|||
> in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.

I have no idea what any of those words even mean. I'm sure LLMs make similar obvious-to-professors mistakes in all the domains. Not long ago, we didn't even have chatbots capable of basic conversation...

recursivecaveat 3 hours ago|||
This is close to my experience with code. LLMs can pick out small mistakes from giant code changes with surprising accuracy, or slowly narrow down a weird. On the other hand I've seen them bravely shoulder on under completely incorrect conceptual models of what they're working with and churn around in circles consequently, spin up giant piles of slop to re-implement something they decided was necessary, but didn't bother to search for, or outright dismiss important error signals as just 'transient failures'. Unlimited stamina, low wisdom.
cyanydeez 3 hours ago|||
I've been watching the automation of things like flight control systems for the past decade, and the evolution of the fallback to a real pilot in the event of a emergency is what's most concerning about where LLMs are being embedded.

Right now, we have a lot of smart people who have trained for decades to understand where these things go wrong and how to nudge them back, but the pool of people are going to slowly be replaced by less knowledgeable.

At some point, a rubicon will be crossed where these systems can't fallback to a human operator and will fail spectacularly.

regularfry 25 minutes ago|||
What that means practically is that we've got a generation - 25 years or less - to evolve these things not to need the fallback. If such a thing is possible.
pbhjpbhj 46 minutes ago||||
Watching a teenager approach their homework, instead of struggling to answer questions they don't know, they ask Gemini. Unfortunately, I think the mental struggle to approach an answer is where much of the learning is. They also miss out on the reward for persistence of seeing things fall together.

It is troubling. It suggests a plateauing of human understanding.

regularfry 25 minutes ago||
It absolutely is where the learning is, that's pretty well established brain science.
leptons 2 hours ago|||
We're on the road to Idiocracy.
DeathArrow 2 hours ago||
I don't think the experience with Gemini will be the same when using GPT.
pmontra 5 hours ago||
It's a very long post with a mix of technical (math) and philosophical sections. Here are the most striking points to reflect upon IMHO.

> It seems to me that training beginning PhD students to do research [...] has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

Training must start from the basics though. Of course everybody's training in math starts with summing small integers, which calculators have been doing without any mistake since a long time.

The point is perhaps confirmed by another comment further down in the post

> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders

People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired. I'm not sure if there is a similar point with math. Again from the post

> suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

bambax 2 hours ago||
> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders

Yes but it's not just that if you solved a problem yourself, you're better at solving other problems; it's also that you actually understand the problem that you solved, much better than if you simply read a proof made by somebody (or something) else.

I see this happening in the enterprise. People delegate work to some LLM; work isn't always bad, sometimes it's even acceptable. But it's not their work, and as a result, the author doesn't know or understand it better than anyone else! They don't own it, they can't explain it. They literally have no value whatsoever; they're a passthrough; they're invisible.

tempaccount5050 1 hour ago||
Are you a cutting edge research scientist or something? Everyone I know works in the same domain every day. The problems are the same. People aren't solving brand new problems to humanity every day. We make budgets and look at ticket counts. Roll out patches. Replace hardware. Upgrade software packages. Make a new dashboard to track a project. I guess if every day is a completely novel thing for you, ok. I feel like the goalposts have moved to an absolutely ridiculous place. Oh no, I won't have a bunch of random error log numbers memorized anymore? Who gives a shit. I just want to afford a place to live so I can play my guitar and make something good for dinner. Maybe I'm just old, but I don't see why the average person needs to be a fuckin genius problem solver.
palata 2 hours ago|||
I feel like you slightly miss both points.

> Training must start from the basics though.

Sure, but the point is that at some point (e.g. when starting a PhD) one needs to do research, not learn the basics. And LLMs make that harder, because they solve the "easy research" part.

Take a young lion "fighting/playing" with another young lion as a way to learn how to fight, and later hunt. And suddenly they get TikTok and are not interested in playing anymore. Their first encounter with hunting will be a lot harder, won't it?

> People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired.

Again, that's true but missing the point: if you never get to be a "good coder", you will always be a "bad vibe coder". Maybe you can make money out of it, but the point was about becoming good.

kerabatsos 4 hours ago||
But perhaps we should regard it as a major achievement.
lmpdev 4 hours ago||
I mean in the same way getting Wolfram Alpha to solve a really hard/ugly differential equation I suppose
kang 22 minutes ago||
> The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

5.5pro is amazing but this implication might not be true & is the core argument of this piece.

AI will prove all sort of things - interesting, boring & incorrect.

To sort it will be the task of the PhD.

mxwsn 4 hours ago||
> Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

This is a cultural choice. It makes sense that in the mathematics culture we currently have, this is alien. But already, other fields, and many individuals, would disagree and say that the human did have a major achievement here. As long as human-AI collaborations are producing the best results, there is meaningful contribution by the humans, and people that are deeper experts and skilled LLM whisperers should be able to make outsized contributions. The real shoe drops when pure AI beats humans and human-AI collaboration.

pmontra 3 hours ago||
I replied to a comment about AI in sports and I build on that.

We praise car drivers despite most of the performance in their sport comes from the car. The driver makes the difference when two cars are close in performance. Brilliances or mistakes. Horse riders too.

In the case of math, the human can lead the LLM on the right track, point it to a problem or to another one. So it deserves some praise.

Then the team that built the car, cared about the horse, built the AI might deserve even more praise but we tend to care more about the single most visible human.

bambax 3 hours ago||
It may not be a major achievement by the mathematician (although it's debatable) but it would still be a major result.
NotOscarWilde 5 hours ago||
As a TCS assistant professor from Eastern Europe, I always am a little jealous of the biggest names in math having such an easy access to the expensive, long thinking models.

Paying for Pro from any of my current academic budgets is completely ouf of the field of reality here -- all budgets tend to have restricted uses and software payments fit into very few categories. Effectively, I'd have to ask for a brand new grant and hope the grant rules allow for large software payments and I won't encounter an anti-AI reviewer; such a thing would take one year at least.

As a nail to the coffin, I was "denied" all Claude Opus recently as part of Microsoft's clampdown on individual (and academic) use of Copilot.

(Chagpt 5.5 Plus does not seem sufficient for any deeper investigations into new research topics, I've tried.)

Apologies for the rant.

vthallam 4 hours ago||
@NotOscarWilde drop your email here, I will reach out and happy to get you a pro account for a few months so you can try 5.5 pro.(work at OAI)
teiferer 3 hours ago|||
While this sounds generous (and in some ways it is), it does not address the general point that GP is making. That is, the systematic disadvantage which large parts of humanity have w.r.t. to access to the tools. You could say they can't drive a Lambhorgini either, but that also doesn't solve the problem.
NotOscarWilde 2 hours ago|||
You're absolutely right (pun intended).

An aside: It was a very nice gesture and completely unexpected by me, so even if it doesn't work out, it made my day. I personally believe that kind gestures have a lot of power.

Back on topic: There is a real danger of the gap between rich and poor universities significantly widening in all fields if the rich can afford Pro level models, or even hardware that can run their own comparable models, and this being fiscally inaccessible to the rest.

One can sweep this under the rug by blaming the educational funding but this just shoots down all discussion. Even if GDP of a country goes up by a lot -- such as Poland -- it takes time before any budget benefit trickles to the education budget, and with some governments it might never do.

I believe Microsoft et al do have the most power here to boost affordable access to AI for researchers on a large scale; the fact that they cut some too expensive models (Opus, 5.5) from their academic benefits package is a grim omen. I do realize they would like universities to pay them also, and ultimately the universities should do that -- but then we are back at the institutional level of the problem.

Scea91 2 hours ago||||
Its a problem of the individual institutions and countries. The budget required for AI tools currently is negligible compared to other university expenses. We don't need to call everything a systemic disadvantage when the disadvantaged (at the institution level) have agency here.
NotOscarWilde 2 hours ago||
Can you tell me what is the budget necessary to supply AI tools capable of substantial research assistance to all academic staff at a university?

You seem to have a good estimate in your head; I definitely do not.

From personal experience, ChatGPT 5.5 (the Plus tier) is excellent for programming tasks and also for various teaching related tasks but I have not observed the research benefits that Tim Gowers has when I asked it questions in my area of expertise. So the costs are definitely higher than a few dozen $ a month per PhD/professor.

You might be right that universities should immediately spring into action and demand funding for research level AI resources and hardware. One thing you might be mistaken in is that public universities are unfortunately very inflexible institutions; one reason for this is that they have a large internal leadership structure AND they are funded by the state, so even if the entire university agrees on something, the funding is at the whim of the ministry of education and thus the current political leadership.

krab 2 hours ago||
> Can you tell me what is the budget necessary to supply AI tools capable of substantial research assistance to all academic staff at a university?

I think the GP meant that *if the tools provide substantial benefit* to staff, their costs can be compared to salaries and other large expenses of the university. The $100/month subscription costs less than your office space.

snayan 2 hours ago|||
I mean, I don't think OpenAI should be wading into the policies and practices of foreign institutions and governments. Look at all the blowback we see from the collision of Anthropic or OpenAI and the US government.

At present, the tools are available for whomever wants to buy them. Not OpenAI's fault that parent comment's government and/or institutions policies haven't been updated to allow for their purchase and use.

I'd argue that the OpenAI dude/dudettes level of generosity is appropriate given the circumstances.

NotOscarWilde 3 hours ago||||
This requires a major "dox" of myself, but I am really grateful for the offer, so these are my academic contacts:

https://pastebin.com/hNYrCjhL

I probably will erase the contents in a few days.

Even if you just drop an email and it doesn't work out, I appreciate this gesture so much. Thank you.

vthallam 3 hours ago|||
Got the contact, will reach out tomorrow, you can delete them.
teiferer 3 hours ago|||
[flagged]
thierrydamiba 4 hours ago|||
Shoutout to you-I will match it if they need other resources. (I don’t work at OAI, just think this is cool)
NotOscarWilde 2 hours ago|||
I will leave the contact up for a bit longer if people want to get in touch and share their experience with the research gap of the models -- or anything, really -- but I do not think there is any need of further support. Like I said elsewhere, the offer of support made my day and the gesture is enough.

Thank you.

alsetmusic 4 hours ago|||
You know what, I'm ashamed that I didn't think of this. I'll sponsor three months. Email in my hn profile. I don't understand the math in the article, but I'd love to help you make progress in it.
fragmede 4 hours ago||
same.
johndough 3 hours ago|||
At my university, everyone had to pay their AI subscriptions out of their own pocket, until a communal AI service was introduced recently. It took 2 years to set up and only serves gpt-oss-120b, so everyone is still using other services. But at least some admin can scatter the word "AI" all over the university's website now and has an excuse to reject any requests for AI subscriptions because "we already have AI".
alsetmusic 4 hours ago|||
It’s a classic example of the best positioned people being in the best position to keep reaping all the rewards.

There’s the example of a poor person and a rich person buying boots. The poor person’s boots wear out and have to be replaced while the rich persons boots last for many years due to higher quality craftsmanship. Over years, the poor person’s boots wear will pay may for boots.

huijzer 4 hours ago|||
I know the example, but as a counter-argument: often more expensive boots are not more durable. It’s about spending time to learn to spot the quality.

Of course if you are really poor, then you have to take expensive shortcuts, but for most people that shouldn’t be the case. Learning to do more with less money isn’t as bad as many people think. It’s also good for the brain to be a bit more creative.

m_mueller 3 hours ago|||
here I think it's less about "poverty" (non-US acedemic budgets are still high, though not in the same sphere), but it's about having red tape when it comes to software. My experience doing a PhD in Japan was: Everything you can touch was basically a free for all - including $500 keyboards and $10k Mac Pros, especially if you are a valued researcher. But software, oh man, how can we prove receipt of goods to accounting...
bambax 3 hours ago|||
OpenRouter lets you pay by the token only (no subscription), has all the frontier models (including Opus 4.7, GPT-5.5) and most of the others, and if you use it sparingly it usually turns out to be quite cheap.
johndough 3 hours ago|||
API pricing for Claude is about an order of magnitude more expensive than subscriptions (numbers: https://she-llac.com/claude-limits). But it may be worth it with DeepSeek V4 Pro, which is currently on discount.
bambax 3 hours ago||
Depends very much on usage! If you connect it to tools like Cursor, etc. then yes a subscription is probably cheaper -- although, you'd have to subscribe to each provider if you want to use them all.

But if you ask questions occasionally, (and don't resend, for example, your whole codebase with each request), then the API feels really cheap, even for the frontier models.

tasuki 1 hour ago|||
My problem with pay-by-the-token is that it discourages me using the thing ("oh the prompt will cost me $0.1"), so I pay a subscription which I'm pretty sure costs me about two-three times what I'd pay just for the api costs, but encourages me to use it more ("oh I have a subscription already, better make use of it").
nerdsniper 1 hour ago|||
I believe ChatGPT 5.5 Pro access is available for $100/month, is that an unrealistic level of expense for someone in your position and geography? Even if the university won't pay for it, it seems you'd like to use this tool for your own goals.

I'm not trying to shame here, just curious whether this is completely unattainable for most researchers in your area.

Computer0 40 minutes ago||
It appears that in their country someone in their position makes about 50k usd annually. I make a similar amount in my country and cannot justify it.
ziotom78 4 hours ago|||
I fully understand your rant! I pay ~20€/month for the Pro account, as my university has a deal with Microsoft and only seems to recognize Copilot, so it’s very hard to use one own’s funding for paying something else.
qq66 4 hours ago|||
Paste what you want me to ask 5.5 Pro and I'll paste you the response.
dyauspitr 4 hours ago||
[flagged]
bananaflag 4 hours ago|||
For a TCS assistant professor in Eastern Europe, $200/month would be 20% of their salary.

And the situation is better, ten years ago it would have been 80%.

iammrpayments 4 hours ago||||
Average European salary is around $4000/month, in eastern Europe is half of that. Median is probably lower than that. Makes me want to quit visiting places like reddit where everybody claims to be making 100k+/year
goobatrooba 4 hours ago||
All salary discussions need a cost of living context. Yes in Europe you earn a bit less but the public services are much better than in the US and one emergency (r.g. healthcare) won't ruin you as it's mostly a public system.

I'll take a Euro salary and qualify life over a FIRE-typs salary and daily fear of falling into the abyss any day.

revolvingthrow 4 hours ago||
Given the topic and the fact llm providers charge global rates, the absolute take-home money is much more relevant. Even if you live like a king on $1000/mo, 5.5 pro is still $200.
fakedang 4 hours ago||
Their loss if they don't move to regional pricing. AI will continue to remain an upper-management luxury then, and won't reach the mass adoption required to justify their outsized valuations.
revolvingthrow 3 hours ago||
Regional pricing makes sense for products that don’t have ongoing costs or where most of the input cost can be offset by local labor. You’re not buying server racks nor electricity at 1/3 of the price to serve poorer markets
teiferer 2 hours ago||
AI pricing is not mainly about cost, it's about market realities, i.e., charging exactly the sweet spot to maximize profit.
xanrah 4 hours ago||||
Lots of people in the west can’t afford 200 a month. How rich are you?
dyauspitr 4 hours ago||
That’s what most people spend on their phone and Internet connections per month in the US. That’s what the average American family spends on just five days of food.
sevg 4 hours ago|||
You can afford five days of food, so that must mean you can also afford a Claude Max plan? What kind of logic is this?
skrebbel 4 hours ago||||
Fwiw your comments here read to me as “I’m super rich and everyone I know is super rich too, and I can’t imagine that anyone isn’t”.
dyauspitr 4 hours ago||
People spend much more than that on just commuting to work if you can spend $200 a month to supercharge what you do at work and 1000x your productivity it’s a no-brainer.
skrebbel 3 hours ago||
From what money? Just pause the health insurance for a while? Stop paying the rent? No diapers for the kid?

Your entire story only makes sense if you have many hundreds of dollars/euros of entirely disposable income every month left, after all unavoidable expenses have been paid for. I understand that this holds for you and everyone you know but I’d like you to appreciate that for very many people it doesn’t.

fuzzy2 4 hours ago||||
Yes and? That's money that is already allocated. It cannot be spent on something else.
xmprt 4 hours ago||
No you don't get it. If the family just starved for 5 days then they could increase revenue for these AI companies.
xanrah 4 hours ago|||
37% of Americans would be unable to cover a 400 usd unexpected expense* without using one or more credit cards. 13% would flat out be unable to cover it. [1]

Are you honestly saying most families would be able to justify 200 usd a month for ChatGPT?

https://www.federalreserve.gov/publications/2025-economic-we...

NotOscarWilde 4 hours ago||||
There is a significant gap between what academics are paid across European countries, and since most top universities here are public institutions, you are right -- Eastern European government employees tend to be on the poorer side.

There are several other philosophical arguments against what you propose but I do not wish to go down that route.

skullone 4 hours ago||||
Bruh, $200/m for most people in the US is also a hard "no!". That's a lot of money. Plus Anthropic isn't doing good deals with orgs that spend less than 250k a month. It's ridiculous.
jdw64 4 hours ago|||
[dead]
MrDrDr 49 minutes ago||
> "Even though I can motivate it in retrospect, ChatGPT’s idea to use h^2-dissociated sets to control relations of order at most h feels quite ingenious. As far as I can tell, this idea is completely original."

The question that keep bothering me is can an LLM generate an idea that is truly novel? How would/could that actually happen? But then that leads to the question - what are we actually doing when we think?

Perhaps it's as simple as the ability to just make mistakes that matters, the same things that powers evolution. As long as the LLM can make mistakes, it's capable of generating something genuinely novel. And it can make more mistakes much faster than we can.

humanfromearth9 38 minutes ago||
For my paper about ME/CFS, I let an LLM integrate lots of findings of other scientific papers. Then I ask the LLM to "creatively brainstorm", given all we know of ME/CFS and the newly integrated paper, to generate new hypotheses, treatment ideas or any other kind of insight it can think of.

This works really well.

Now, it's clear that I have no idea how much of this is something we would consider new and original, and how much is a kind of systematic, but not novel, easy of thinking.

What I couldn't do so far is get an LLM to generate a truly new maths theory, with new abstract concepts and dimensions and points of view. The kind that is not just a combination of existing theories and logic.

eterm 38 minutes ago|||
My own take, and it's veering into the Philosophy of Mathematics, but there's a debate about whether Mathematics is "Invented" or "Discovered".

If it's "invented", then it requires ingenuity.

If it's "discovered", then it was always already there, just waiting for the right connections to be made for it to be uncovered and represented in a way we can understand.

Invention requires ingenuity, but discovery does not. So if LLMs can generate truly novel mathematics, for me that settles it that mathematics is indeed discovered, as LLMs are quite capable of discovery yet I don't consider them possible of invention.

MrDrDr 25 minutes ago||
I like this distinction, but it would then seem the only 'invention' would be the axioms of your mathematics. There exists numbers (natural, imaginary...), there exist shapes (a point, a line...). All the work from that point on could be 'discovered'. I agree that I don't see LLMs inventing in this way. But again, it raised the question - what are our brains doing when we 'invent' something?
LiamPowell 36 minutes ago|||
Trivially the answer is yes by the infinite monkey theorem. If we allow the sampler to pick any token then any stream of arbitrary tokens can be generated. Therefore if an original idea can be represented with written words then a LLM can generate it. That is perhaps not the most satisfying answer, but if you want a better one you'll need to provide a function that determines if an idea is original.
jasfi 38 minutes ago|||
It's about the ability to combine ideas in novel ways, without breaking the rules in relevant frameworks. Sometimes the idea may even be to contradict existing theories where they are weak.
ikari_pl 44 minutes ago||
How do you define a new idea?

To me, it's rearranging the information you had in a way that hasn't been applied or published before.

That's literally what LLMs are built for.

few 5 hours ago||
>So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody.

This made me a little sad

jdale27 3 hours ago||
I don't know that it's that disappointing. I doubt most of the great mathematicians were actually doing it to achieve immortality. I suspect most of them were either after (possibly indirect) practical applications (via the math -> physics -> engineering pipeline) or just "for the love of the game", appreciation of the beauty of math and the intellectual joy of doing it. AI might also take over the practical application side, but the other aspects are still there for the taking.
hodgehog11 3 hours ago||
Exactly. Gowers is in the unique position to think about the "glory" of frontier mathematics, but for essentially everybody (especially those working outside of number theory), that dream died long ago. There are far too many mathematicians now.

Many mathematicians work because they love the breakthrough (a certain quote of Villani comes to mind). They love finding new results, uncovering new mysteries. From that point of view, having an AI that can build on your basic ideas and refine them into more powerful arguments is awesome, regardless of who gets the credit. There are those that treat it more like solving puzzles so the result is not of interest. From that point of view, I can see the dissatisfaction. But I have found those with that viewpoint don't tend to make it as far in academia as those with the other viewpoint.

bananaflag 5 hours ago||
Now repeat that for every sort of human achievement
bel8 4 hours ago||
Machines are comming even after table tennis :(

https://www.youtube.com/watch?v=VVEzgYxDdrc

pmontra 4 hours ago||
Sports are safe. Machines came after runners (motogp, formula 1) and yet we cheer the winners of the 100 m at the Olympics Games. Fully autonomous bikes and cars won't change that. AIs destroy chess players. We still cheer the world champion.

We care about sports with humans.

fragmede 3 hours ago||
Robot MotoGP would be amazing to see just how far the limits could be pushed without risking the life of a human though. Or even full size remote control.
Ekaros 1 hour ago||
Sadly I don't think there is any safe tracks for proper autonomous car racing without limits... Still would be interesting to see what is the absolute best you could do if rules include only say minimum number of wheels and maximum dimensions for vehicles.
MinimalAction 5 hours ago||
As a graduate student, this piece made me sad. I always believed that my work speaks for itself and transcends beyond my limited time on this cosmic experience. This notion of immortality was just a small intangible bonus I hoped for when I jumped into grad school. AI is making me feel less worthy.
hodgehog11 3 hours ago||
As someone who is much further down the track, I would kindly suggest you drop that line of thought. I've seen far too many brilliant and ambitious people drop into depression because of it.

You are worthy of doing this work because you are able to do it. Do the work because you love it and because you love the mystery. Enjoy every moment that you get to do it. Find joy in the great fortune you have to do this work while others toil away on tasks that bring them no satisfaction. Sometimes it's tedious, but sometimes it's incredibly rewarding in its own right.

Don't work for the possibility of eternal glory though, it just doesn't exist anymore.

ionwake 2 hours ago|||
I feel bravery transcends time better than the odd scientific breakthrough which are often attributed to one, but whose roots came from a "lesser" unknown
whatever120 4 hours ago|||
You are worthy. You will hone your skills in grad school and be able to command these AIs better than somebody who hasn’t struggled with hard problems for a long time.
jlarcombe 4 hours ago||
A depressing thought that all that work is just so you can "command AIs better"
folderquestion 2 hours ago|||
It could happen than the AI, in a near future, is not something external but just a part of your brain, so you retain the glory.
jlarcombe 1 hour ago||
Hah this is getting worse and worse
alexashka 1 hour ago|||
All that work to kick a ball into a net.

Nobody looks at this species and goes hm, rational and reasonable :)

alexashka 1 hour ago||
> I always believed that my work speaks for itself and transcends beyond my limited time on this cosmic experience

Any statement preceded by the word 'believe' is a coping mechanism.

> This notion of immortality was just a small intangible bonus I hoped for when I jumped into grad school

Any statement preceded by the word 'hope' is a coping mechanism.

> AI is making me feel less worthy

Worth comes from understanding, not achievement.

iandanforth 25 minutes ago||
I found the section on publishing very interesting. Even if the quality of the output is up to snuff, where should it go? Arxiv doesn't allow AI written work. The author proposes that only work that has been certified by human should be published. However, now the field is in the same boat as software engineering where we are facing a glut of pull requests and not enough time and people to review them.
bustermellotron 5 hours ago|
I saw Tim Gowers give a talk at the AMS-MAA joint meeting in Seattle about ten years ago where he predicted that in 100 years humans would no longer be doing research mathematics. I wonder if he’s adjusted his timeline.

At the time I thought the key missing tool was a natural language search that acted like mathoverflow, where you could explain your problem or ideas as you understood them and get references to relevant literature (possibly outside your experience or vocabulary).

More comments...