GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search

Posted by simonw 7 days ago

GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search(simonwillison.net)

Related: Google's new AI mode is good, actually - https://news.ycombinator.com/item?id=45158586 - Sept 2025 (31 comments)

360 points | 255 commentspage 2

senko 6 days ago|

Nice writeup.

This may nudge me to start using chatbots more for this type of queries. I usually use Perplexity or Kagi Assistant instead.

Simon, what's your opinion on doing the same with other frontier systems (like Claude?), or is there something specific to ChatGPT+GPT5?

I also like the name, nicely encodes some peculiarities of tech. Perhaps we should call AI agents "Goblins" instead.

simonw 6 days ago||

I've been much more impressed by GPT-5 than the other systems I've tried so far - though I did get a surprisingly good experience from the new Google AI mode (notably different from AI overviews): https://simonwillison.net/2025/Sep/7/ai-mode/

milanhbs 5 days ago||

I agree, I've found it very useful for search tasks that involve putting a few pieces together. For example, I was looking for the floor plan of an apartment building I was considering moving to in a country I'm not that familiar with. Google: Found floor plans on architect's sites - but only ones of rejected proposals.

ChatGPT: Gave me the planning proposal number of the final design, with instructions to the government website where I can plug that in to get floor plans and other docs.

In this case, ChatGPT was so much better at giving me a definitive source than Google - instead of the other way around.

mritchie712 5 days ago||

I was curious how much revenue a podcast I listen to makes. The podcast was started by two local comedians from Phoenix, AZ. They had no following when they started and were both in their late 30's. The odds were stacked against them, but they rank pretty high now on the the Apple charts now.

I looked into years ago and couldn't find a satisfying answer, but GPT-5 went off, did an "unreasonable" amount of research, cross referenced sources and provided an incredibly detailed answer and a believable range.

creesch 5 days ago||

> an incredibly detailed answer and a believable range.

Recently, it started returning even more verbose answers. The absolute bullshit research paper that Google Gemini gives you is what turned me away from using it there. Now, chatGPT also seems to go for more verbose filler rather than actually information. It is not as bad as Gemini, but I did notice.

It makes me wonder if people think the results are more credible with verbose reports like that. Even if it actually obfuscates the information you asked it to track down to begin with.

I do like how you worded it as a believable range, rather than an accurate one. One of the things that makes me hesitant to use deep research for anything but low impact non-critical stuff is exactly that. Some answers are easier to verify than others, but the way the sources are presented, it isn't always easy to verify the answers.

Another aspect is that my own skills in researching things are pretty good, if I may so myself. I don't want them to atrophy, which easily happens with lazy use of LLMs where they do all the hard work.

A final consideration came from me experimenting with MCPs and my own attempts at creating a deep research I could use with any model. No matter the approach I tried, it is extremely heavy on resources and will burn through tokens like no other.

Economically, it just doesn't make sense for me to run against APIs. Which in my mind means it is heavily subsidized by openAI as a sort of loss-leader. Something I don't want to depend on just to find myself facing a price hike in the future.

rbliss 5 days ago||

What was the range?

mritchie712 5 days ago||

$2.3M–$3.7M per year gross revenue:

Putting it together (podcast‑only)

Ads (STM only): 0.85 – $0.95M

Memberships: $1.5–$2.6M/yr

Working estimate (gross, podcast‑only): $2.3M – $3.7M per year for ads + its share of memberships. Mid‑case lands near $2.9M/yr gross; after typical platform/processing fees and less‑than‑perfect ad sell‑through, a net in the low‑to‑mid $2Ms seems plausible.

---

Quick answers

How many listeners: 175K downloads per episode, ~1.4–2.1M monthly downloads, demographics skew U.S., median age ~36.

How much revenue? $2.3M– $3.7M/yr gross from ads + memberships attributable to the STM show

zwaps 5 days ago||

In recent weeks, doing a deep research with gpt5 seems broken for me

It usually looks at less than 20 sources. Basically, it has become useless.

Conversely, just doing thinking or pro mode gives me hundreds of sources

What gives?

croemer 6 days ago||

Yes, "GPT-5 with thinking" is great at search, but it's horrible that it shows "Network connection lost. Attempting to reconnect..." after you switch away from the app for even just a few seconds before coming back.

It's going to take a minute, so why do I need to keep looking at it and can't go read some more Wikipedia in the mean time?

This is insanely user hostile. Is it just me who encounters this? I'm on Plus plan on Android. Maybe you don't get this with Pro?

Here's a screenshot of what I mean: https://imgur.com/a/9LZ1jTI

simonw 5 days ago||

Weird. That doesn't happen to me on iOS - I can post the question, wait just long enough for it to display "Thinking...." and then go and do something else.

It even shows me a push notification at the top of my screen when the search task has finished.

IanCal 5 days ago||

As a counter I've found that the iOS app is insanely unreliable. I've lost chats, it messes up and says there has been no response, connection lost and more. It's been really bad. Often when it's reported no result and failed, I go to the site and everything is fine. If things fail I no longer retry as that's how I've permanently lost history before (which is insane, don't lose my shit), and go and check the website.

Insane ratio of "app quality" to "magic technology". The models are wild (as someone in the AI mix for the last 20 years or so) and the mobile app and codex integrations are hot garbage.

tasercake 5 days ago|||

I’ve had this happen on iOS too, usually when I switch away from the thread or the app before it progresses past the initial “Thinking…”.

But I’ve found that no matter the error - even if I disconnect from the internet entirely - I eventually get a push notification and opening up the thread a while later shows me the full response. (disclaimer: N=1)

wolttam 5 days ago|||

Yeah it should be able to perform these entirely as a process on their end and the app should just check in on progress.

One of the complications of your average query taking at least some number of seconds to complete - that is, long enough for the user to do something else while waiting.

timpera 5 days ago|||

I'm also on Android with the Plus subscription and I also get this. It usually reconnects by itself a few seconds later, but if it doesn't, I've found that you can get to the answer by closing the app and reopening it.

Tenemo 5 days ago||

I had the same problem and I figured out how to fix it! For Samsungs, Apps ‐> ChatGPT -> Battery -> Unrestricted completely fixed the issue for me, it continues thinking/outputting in the background now. Should be a similiar setting for other Android distributions. Basically, it wasn't the app's fault, the OS is just halting it in the background to save battery.

croemer 5 days ago||

Thank you! That fixed it for me on Pixel 8 as well. Would be great if the app suggested this as a fix.

astrange 5 days ago|||

This is fine because it does reconnect, but some actions cause it to just stop thinking and not write a response, and I can't tell what they are. Wastes a few minutes because you have to re-run it.

cm2012 5 days ago||

Ive always had this happen too, super annoying. Android.

MrContent04 4 days ago||

Even if languages could handle formatting ‘automatically,’ teams would still need style consistency across tools/editors. Formatting isn’t just about syntax, it’s about communication. Maybe the real question is: how much of code style is human preference vs machine necessity?

iguana2000 5 days ago||

I agree with this completely; ChatGPT search is perfect for most use cases. I find it to be better than OpenAI's deep research in my experience-- it often uses 2-3x the sources, and has a more comprehensive, well-thought-out report. I'm sure there are still cases where deep research is preferable, but I haven't come across those yet.

cmilton 5 days ago||

I just can't get over how gleeful the author sounds in wasting compute on "an often unreasonable amount of work to search the internet and figure out an answer."

Is that the goal? Send this thing off on a wild goose chase, and hope it comes back with the right answer no matter the cost?

simonw 5 days ago||

Playing with tools like this is how I learn to use them.

cmilton 5 days ago||

I can definitely understand that. I just have a hard time justifying the value of learning this way at scale. Just the energy costs alone are out of hand.

KaoruAoiShiho 5 days ago||

Uh, people have wasted entire lifetimes chasing wild goose. Newton and Einstein both spent the latter halves of their lives :( despite being geniuses.

cmilton 5 days ago||

I think the primary difference would be they didn't waste billions of dollars in their research.

AftHurrahWinch 5 days ago||

Isaac Newton dedicated over thirty years to the study and practice of alchemy, writing over one million words on the subject. Comparable in scale to his writings on mathematics and physics.

I'd rather GDP be $1B smaller right now if it meant that Newton had spent another 30 years on physics and math.

eru 5 days ago||

About 'Britannica to seed Wikipedia': the German Wikipedia used Meyers Konversations-Lexikon https://en.wikipedia.org/wiki/Meyers_Konversations-Lexikon

More comments...