GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search

Posted by simonw 7 days ago

GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search(simonwillison.net)

Related: Google's new AI mode is good, actually - https://news.ycombinator.com/item?id=45158586 - Sept 2025 (31 comments)

360 points | 255 commentspage 3

d4rkp4ttern 5 days ago|

Has Deep Research been removed? I have a Pro subscription and just today noticed Deep Research is no longer shown as an option. In any case I’ve found using GPT-5 Thinking, and especially GPT-5 Pro with web search more useful than DR used to be.

CuriouslyC 6 days ago||

Yeah, the % of the time I need to dip into deep research with GPT5 is much lower than GPT4 for sure. It even beats Gemini's web grounding which is impressive, I think most of the lift here is how smart/targeted its queries and follow-ups are.

arnaudsm 5 days ago||

It's a great tool, but the mobile experience isn't great. Everytime the socket connection fails in the background, and i have to restart and refresh the app twice to get my results.

ants_everywhere 5 days ago||

Yeah this is what people are doing with LLMs every day. I don't quite get what is supposed to be different in the blog post.

HN is a bit weird because it's got 99 articles about how evil LLMs are and one article that's like "oh hey I asked an LLM questions and got some answers" and people are like "wow amazing".

Not that I mind. I assume Simon just wanted to share some cool nerdy stuff and there's nothing wrong with the blog post. It's just surprising that it's posted not once but twice on HN and is on the front page when there's so much anti-AI sentiment otherwise.

simonw 5 days ago|

What's different is that LLMs with search tools used to be terrible - they would run a single search, get back 10 results and summarize those.

Often the results were bad, so the answer was bad.

GPT-5 Thinking (and o3 before it, but very few people tried o3) does a whole lot better then that. It runs multiple searches, then evaluates the results and runs follow-up searches to try to get to a credible result.

This is new and worth writing about. LLM search doesn't suck any more.

ants_everywhere 5 days ago||

Like I said I have nothing against the blog post or writing about it, that was by no means meant as a criticism of you. And I agree it's worth writing and talking about. What surprises me is that we're in a forum for technology enthusiasts.

FWIW Gemini at least has been pretty good at this since late 2024 IMO.

As for where things are now, I just ran a comparison with ChatGPT 5 in thinking mode against Google search's AI mode across a few questions. They performed the same on the searches I tried and returned substantially the same answer except for some minor variation here or there. Google search is maybe an order of magnitude faster. Google obviously has an advantage here which is that it has full access to their search and ranking index.

And of course the ability to make multiple searches and reason about them for been available for months, maybe almost a year, as deep research mode. I guess the novelty now is you can wait a smaller time and get research that's less deep.

simonw 5 days ago||

Yeah, the new Google AI mode is impressive too. I wrote about that here: https://simonwillison.net/2025/Sep/7/ai-mode/

pamelafox 5 days ago||

I am giving it a go for parenting advice- “My 5 year old is suddenly very germ concious. Doesnt want to touch things, always washing hands. Do deep research, is this normal?” https://chatgpt.com/share/68be1dbd-187c-8012-98d7-83f710b12b...

The results look reasonable? It’s a good start, given how long it takes to hear back from our doctor on questions like this.

sireat 5 days ago||

Like Simon I've started to use camera for random ChatGPT research. For one ChatGPT works fantastically at random bird identification (along with pretty much all other features and likely location) - https://xkcd.com/1425/

There is one big failure mode though - ChatGPT hallucinates middle of simple textual OCR tasks!

I will feed ChatGPT a simple computer hardware invoice with 10 items - out comes perfect first few items, then likely but fake middle items (like MSI 4060 16GB instead of Asus 5060 Ti 16GB) and last few items are again correct.

If you start prompting with hints, the model will keep making up other models and manufacturers, it will apologize and come up with incorrect Gigabyte 5070.

I can forgive mistaking 5060 for 5080 - see https://www.theguardian.com/books/booksblog/2014/may/01/scan... . However how can the model completely misread the manufacturers??

This would be trivially fixed by reverting to Tesseract based models like ChatGPT used to do.

PS Just tried it again and 3rd item instead of correct GSKILL it gave Kingston as manufacturer for RAM.

Basically ChatGPT sort of OCRs like a human would, by scanning first then sort of confabulating middle and then getting the footer correct.

simonw 5 days ago|

Yeah, I've been disappointed in GPT-5 for OCR - Gemini 2.5 is much better on that front: https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-cod...

IanCal 5 days ago||

Images in general, nothing comes close to Gemini 2.5 for understanding scene composition. They perform segmentation and so you can even ask for things like masks of arbitrary things or bounding boxes.

hendersoon 5 days ago||

It is pretty good yes, but I find GPT5 thinking to be unusably slow for any sort of interactive work.

dclowd9901 5 days ago||

> Starbucks in the UK don’t sell cake pops! Do a deep investigative dive

...

I used to play games on my computer a lot. Not so much anymore, don't really want to lock myself in a room alone and play games. I have kids and a wife, and it feels isolative.

But those days I would, and often the hardware I had was underpowered to be able to experience the game in its full glory. I would often spend hours and hours just honing settings and config and environment to get the game to run at peak capability on my machine.

At some point, I would reach a zenith. Some perfect arrangement of settings and environment that gave me a game running at top quality on my machine (or as close to top as I could get). The experience for me is joyous. So enjoyable that I often didn't even play the game except maybe to test the boundaries of its performance at that level.

Reading this article made me sad for people who don't put in work for some sort of accomplishment that amounts to nothing. And it made me think of my own experience with it. Accomplishment for its own sake is still accomplishment. And it's still self realization, which is important to existing.

simonw 5 days ago|

This is a common theme with LLMs (and LLM criticism).

The context: I was rushing for a train, I ran into Starbucks at the station for a coffee, I noticed they didn't have cake pops and the staff member didn't appear to know what they were.

I see three choices here:

1. Since I'm mildly curious about Starbucks and cake pop availability in the UK, I get on the train, open up my laptop and dedicate realistically a solid half hour or more to figuring out what's going on.

2. I fire off a research question at GPT-5 Thinking on my mobile phone.

3. I don't do any research at all and leave my mild curiosity unsaturated.

Realistically, I think the choices are between 2 and 3. I was never going to perform a full research project on this myself.

See also: AI-enhanced development makes me more ambitious with my projects, which I wrote in March 2023 and has aged extremely well. https://simonwillison.net/2023/Mar/27/ai-enhanced-developmen...

I do plenty of deep dive research projects myself into topics both useful and pointless - my blog is full of them!

Now I can take on even more.

rthrfrd 5 days ago||

I think what's interesting/telling is you view (3) as less desirable.

Alternatively, you could have spent that half hour on the train exercising your own creativity to try and satisfy your curiosity. Whether you're right or wrong doesn't really matter, because as you acknowledge it's not really important enough to you to matter. Picking (2) eliminates all the possible avenues that might have lead you down.

I'm not saying one is better than the other, just that you're approaching the criticism on the basis of axioms that represent a narrow viewpoint: That of someone who has to be "right" about the things they are curious about, no matter how trivial.

simonw 5 days ago||

I think one of my personal core values is that curiosity should never be left unsatiated if ant all possible!

I spent my half hour on the train satiating all sorts of other things instead (like the identity of that curious looking building in Reading).

> Picking (2) eliminates all the possible avenues that might have lead you down.

I don't think that's the case. Using GPT-5 for the Cake Pop question lead me down a bunch of avenues I may never have encountered otherwise - the structure of Starbucks in the UK, the history of their Cake Pops rollout, the fact that checking nutritional and allergy details on their website is a great way to get an "official" list of their products independent of what's on sale in individual stores, and it sparked me to run a separate search for their Cookies and Cream cake pop and find out had been discontinued in the US.

Not bad for typing a couple of prompts on my phone and then spending a few extra minutes with the results after the research task had completed.

Now multiply that by a dozen plus moments of curiosity per day and my intellectual life feels genuinely elevated - I'm being exposed to so many more interesting and varied avenues than if I was manually doing all of the work on a smaller number of curiosities myself.

rthrfrd 5 days ago||

> I think one of my personal core values is that curiosity should never be left unsatiated if ant all possible!

I don't disagree: I just posited that there are other ways to satisfy it, and that there is an opportunity cost to the path you've chosen to satisfy it that you don't seem very aware of, because your curiosity and desire to be correct are tightly coupled - but that doesn't actually have to be the case. It has its pros and cons.

Now I'm more of an "it's the journey not the destination" guy, so accelerating the journey doesn't appeal to me as much as it used to, because for me its where I get the most value. That change in my perspective is what motivated me to comment.

But anyway, you clearly enjoy it and do great work, so all the best with it!

gizajob 5 days ago||

Personally, I'm happy that after a 30 year effort in Silicon Valley and hundreds of billions spent, AskJeeves finally works as intended.

p0w3n3d 5 days ago|

It's certainly better than google/bing search (non-ai). To be honest I had been observing google/bing (via duckduckgo) decline of search capabilities over recent years. I "had been" unless I stopped observing it because it went below any acceptable level. TBH the only thing I can find on them nowadays is products, and sometimes general information. All the technical articles, api links, etc. are unfindable. Among others that's why I'm holding to hackernews recently (which was the best thing I learned from my colleague, and it breaks my information bubbles). So basically I'm usually starting with ddg, then go to google, and if failed, falling back to chatgpt which is very accurate nowadays.

Example query: a keyboard stand with music (notes) stand.

-- Disclaimer--

It might be connected to the web enshittification process which has been undergoing for quite some time already.

More comments...