GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search

Posted by simonw 7 days ago

GPT-5 Thinking in ChatGPT (a.k.a. Research Goblin) is good at search(simonwillison.net)

Related: Google's new AI mode is good, actually - https://news.ycombinator.com/item?id=45158586 - Sept 2025 (31 comments)

360 points | 255 commentspage 5

BoredPositron 6 days ago|

[flagged]

gdbsjjdn 6 days ago||

As someone who is AI skeptical, there's so many breathless posts like "Jizz-7 Thinking (Good) (Big Balls) can order my morning coffee!" which are a lot of words talking about one person's subjective experience of using some LLM to do one specific thing.

Lerc 6 days ago||

Could you post a selection? It would be intersting to gauge what you mean by breathless.

People posting their subjective experience is precisely what a lot of these pieces should be doing, good or bad, their experience is the data they have to contribute.

gdbsjjdn 3 days ago|||

> People posting their subjective experience is precisely what a lot of these pieces should be doing, good or bad, their experience is the data they have to contribute.

The plural of anecdote is not data. These subjective posts about experiences vibe coding, etc. may be entertaining but if you read 10 of them it doesn't give you an objective view of the state of LLMs. It gives you 10 opinions by 10 people who chose to blog about how they felt using a tool.

smohare 6 days ago|||

[dead]

jryle70 6 days ago|||

First of all, why is it bad? That's my pet peeves of reading HN. People assume their opinion as fact. I found this blog piece interesting. Probably other people as well, that's why it's on the front page.

Second of all, Simon's content are often informative, more or less sticking to the facts, not flame bait. I never upvote or flag any content from anyone.

scrollaway 6 days ago|||

Simon's writing is consistently either highly practical, or extremely high quality, or both. What's your reference frame to call it "bad" - your own comments?

BoredPositron 6 days ago|||

Spending thousands of words to essentially say "ChatGPT's search feature works pretty well now" with mundane examples like finding UK cake pop availability or identifying buildings from train windows. This has been done before by less capable models - it's just a rehash. Should we expect newer models getting worse? The breathless "Research Goblin" framing and detailed play-by-play of basic web searches feels like padding to make a now routine tool use seem revolutionary.

simonw 6 days ago|||

The mundane examples were the point. I'm not picking things to show it in the best possible light, I picked a representative sample of the ways I've been using it.

I called out the terrible scatter plot of the latitude/longitude points because it helped show that this thing has its own flaws.

I know so many people who are convinced that ChatGPT's search feature is entirely useless. This post is mainly for them.

BoredPositron 6 days ago||

[flagged]

simonw 6 days ago||

The thing about models getting incrementally better is that occasionally they cross a milestone where something that didn't work before starts being useful.

Those are the kinds of things I look out for and try to write about.

lbotos 6 days ago||||

Simon says “what I used to Google I now try AI thinking models”

I didn’t feel that he was framing it as _revolutionary_ it felt more evolutionary.

Simon, for every person miffed about your writing, there is another person like me today who said “ok, I guess I should sign up for Simon’s newsletter.” Keep it up.

It’s easy to be a hater on da internet.

42lux, if you have better articles on AI progress do please link them so we can all benefit.

I wanna know when my research goblin can run on my box with 2x 3090s.

BoredPositron 6 days ago|||

If you want posts like this you can just follow AI influencers on LinkedIn.

Deuter8 6 days ago|||

[dead]

typpilol 6 days ago|||

Yes this feels very AI like too. A ton of prose for very little substance lol.

I skipped half the article to get to the point, went back and re-read and didn't miss much.

simonw 5 days ago||

I don't use AI to generate writing on my blog.

typpilol 5 days ago||

Oh I know. Just this particular article seemed very sparse.

mattlondon 6 days ago|||

FWIW I take his writings with a hefty pinch of salt these days. It seems incredibly concentrated on OpenAI to the detriment of anything else. This was only cemented when he ended up appearing on some OpenAI marketing video.

This is fine. He is his own person and can write about whatever he wants and work with whoever he wants, but the days when I'd eagerly read his blog to get a finger of the pulse of all of the main developments in the main labs/models has passed, as he seems to only really cover OpenAI these days, and major events from non-OpenAI labs/models don't seem to even get a mention even if they're huge (e.g. nano banana).

That's fine. It's his blog. He can do what he wants. But to me personally he feels like an OpenAI mouthpiece now. But that's just my opinion.

simonw 6 days ago|||

"It seems incredibly concentrated on OpenAI to the detriment of anything else."

My most recent posts:

- https://simonwillison.net/2025/Sep/7/ai-mode/ - Google/Gemini

- https://simonwillison.net/2025/Sep/6/research-goblin/ - OpenAI/GPT-5

- https://simonwillison.net/2025/Sep/6/kimi-k2-instruct-0905/ - Moonshot/Kimi/Groq

- https://simonwillison.net/2025/Sep/6/anthropic-settlement/ - Anthropic (legal settlement)

- https://simonwillison.net/2025/Sep/4/embedding-gemma/ - Google/Gemma

So far in 2025: 106 posts tagged OpenAI, 78 tagged Claude, 58 tagged Gemini, 55 tagged ai-in-china (which includes DeepSeek and Qwen and suchlike.)

I think I'm balancing the vendors pretty well, personally. I'm particularly proud of my coverage of significant model releases - this tag has 140 posts now! https://simonwillison.net/tags/llm-release/

OpenAI did get a lot of attention from me over the last six weeks thanks to the combination of gpt-oss and GPT-5.

I do regret not having written about Nano Banana yet, I've been trying to find a good angle on it that hasn't already been covered to death.

sangeeth96 6 days ago|||

> I think I'm balancing the vendors pretty well, personally.

You are. Pretty much my main source these days to get a filtered down, generalist/pragmatic view on use of LLMs in software dev. I'm stumped as to what the person above you is talking about.

OT: maybe I missed this but is the Substack new and any reason (besides visibility) you're launching newsletters there vs. on your wonderful site? :)

simonw 6 days ago||

The Substack is literally the exact same content as my blog, just manually copied and pasted into an email once a week or so for people who prefer an email subscription.

I wrote about how it works here: https://simonwillison.net/2023/Apr/4/substack-observable/

Squarex 6 days ago||||

I used to read and love your blog, but recently I've noticed a bias towards OpenAI since you were involved with the ChatGPT-5 prerelease.

simonw 6 days ago||

As soon as another lab release an exciting new model (Anthropic and Gemini have both been quiet since GPT-5, with the exception of nano banana which I do intend to cover) I'll write about what they're up to.

BoredPositron 6 days ago|||

[flagged]

simonw 6 days ago||

I'm one person with a blog, and I have other things to do.

Sometimes I feel that there's pressure for me to be a full blown newspaper covering everything that happens in a multi-billion dollar industry.

BoredPositron 6 days ago||

[flagged]

simonw 6 days ago|||

I said the opposite of that: I haven't written about it yet because I didn't have anything interesting to say, and I try to write things that add value.

I wrote about my approach to that here: https://simonwillison.net/2024/Dec/22/link-blog/#trying-to-a...

BoredPositron 6 days ago||

[flagged]

simonw 6 days ago||

You misunderstood me.

indigodaddy 6 days ago|||

you are really passive-aggressively toxic and your comments are really providing no value to this thread.

BoredPositron 6 days ago||

I am answering his questions nothing more nothing less.

firesteelrain 6 days ago||||

Never read his blog and I like the writing.

> he feels like an OpenAI mouthpiece now

That seems a little harsh. But, I felt the same about older blogs I used to read such as CodingHorror. They just aren’t for me anymore after diverging into other topics.

I really liked this article and the coining of the term “Research Goblin”. That is how I use it too sometimes. Which is also how I used to use Google.

jryle70 6 days ago|||

His content seem pretty fair and balanced.

https://news.ycombinator.com/submitted?id=simonw

Or take a look at his website:

https://simonwillison.net/

At least you admit it's your opinion. Maybe that's your bias showing?

yorwba 6 days ago|||

This post is currently number 144 in newest and not listed in the second-chance pool https://news.ycombinator.com/pool so I think this is its first chance.

BoredPositron 6 days ago|||

https://hn.algolia.com/?q=research+goblin it's like the third time it gets posted and only got traction because I asked why it's popping up in the new q over and over again.

redeyedtreefrog 6 days ago||

The original author submitted it, then when it didn't get traction it looks like two fans of his blog both submitted it around 12 hours later. Whether for internet upvote points or because they personally thought the article particularly great, I don't know.

Personally I generally enjoy the blog and the writing, but not so much this post. It has a very clickbaity title for some results which aren't particularly impressive.

dang 6 days ago|||

(This was posted before we merged the thread hither from https://news.ycombinator.com/item?id=45156067)

simonw 6 days ago|||

What's bad about my post?

adzm 6 days ago||

I'm officially adopting the term Research Goblin, thanks.

baq 6 days ago|||

It’s a very interesting balance between ‘LLMs are unpredictable thus useless’ and ‘LLMs are an amazing revolution, next step on the ladder of human civilization’.

I find it informative that search works so well. I knew it works well, but this feels like step above whatever Gemini can do, which is my go to workhorse for chatbots.

ascorbic 6 days ago|||

> Please don't post shallow dismissals, especially of other people's work.

BoredPositron 6 days ago||

I asked why it's popping up over and over again in new today. I wouldn't have commented otherwise.

dang 6 days ago|||

Reposts are allowed through after 8 hours if a story hasn't had significant attention yet. After that, we treat reposts as dupes for a year or so. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.

This is on purpose, because we want good stories to get multiple chances at getting noticed. Otherwise there's too much randomness in what gets traction.

Plenty of past explanations here:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

BoredPositron 6 days ago||

look at the timelines again... https://hn.algolia.com/?q=research+goblin

The 8 hours seem not to count if you submit under a different domain or do they reset after each try?

Would also be great if you would answer emails especially if they are related to GDPR. You have two of them in your inbox from over 6 months ago send from the email in my account.

TiredOfLife 6 days ago|||

From https://en.wikipedia.org/wiki/Hacker_News : "Hacker News (HN) is a social news website"

From https://en.wikipedia.org/wiki/Social_news_website : "A social news website is a website that features user-posted stories. Such stories are ranked based on popularity, as voted on by other users of the site or by website administrators."

The article was recently published, users on HN submitted the article. Other users thought it interesting and upvoted. Earth has different time zones (I understand it's difficult for americans to grasp) and so different people are active at different times.

typpilol 6 days ago|||

It's 8 hours from time of post no? So timezones don't really affect anything here or am I missing something?

CuriouslyC 6 days ago||

HN is very cult-of-personality based. People see SimonW they upvote without reading, while at the same time a much better article could be posted on the same topic and get zero traction. Not trying to single Simon out here, I generally find his posts good, just a statement of the herdthink and cognitive laziness of this community (and humans in general, to be fair).

TiredOfLife 6 days ago|||

It's not personality, but source. Like i see a post from The Register or Ars Technica I know that it will be at best completely wrong. While posts from simonwilson (for a long time I thought it was like Anandtech. A group of people posting under one domain) are usually good

stephen_cagle 6 days ago||||

I mean if you have a better idea for how to assign your attention, then I am all ears. :]

I'd say trust is a pretty reasonable way to assign attention.

I guess the fairest way might theoretically be to require everything to be submitted anonymously, with maybe authorship (maybe submissionship) only being revealed after some assigned period?

This is better for the incubants, but would require a huge amount of energy compared to "Oh, simon finds this interesting, I'll take a looksy".

haswell 6 days ago|||

I don’t think this framing quite captures what’s going on.

The AI space is full of BS and grift, which makes reputation and the resulting trust built on that reputation important. I think the popularity of certain authors has as much to do with trust as anything else.

If I see one of Simon’s posts, I know there’s a good chance it’s more signal than noise, and I know how to contextualize what he’s saying based on his past work. This is far more difficult with a random “better” article from someone I don’t know.

People tend to post what they follow, and I don’t think it’s lazy to follow the known voices in the field who have proven not to be grifting hype people.

I do think this has some potential negatives, i.e. sure, there might be “much better” content that doesn’t get highlighted. But if the person writing that better content keeps doing so consistently, chances are they’ll eventually find their audience, and maybe it’ll make its way here.

politelemon 6 days ago||

You're not negating anything they've said, but given some insight into why the case might be. However the cult of personality and brand still exists and as a result heavily distorts what could appear here.

Saying that someone ought to write better consistently for them to "make its way here" leans completely into the cult of personality.

I think following people would be better served though personal RSS feeds, and letting content rise based on its merit ought to be an HN goal. How that can be achieved, I don't know. What I am saying is that the potential negatives are far far understated than they ought to be.

haswell 6 days ago||

I think you’re mistaking my comment for an endorsement when it was primarily attempting to reframe and describe the dynamic.

> Saying that someone ought to write better

I did not say someone ought to write better. I described what I believed the dynamic is.

> I think following people would be better served though personal RSS feeds

My point was that this is exactly what people are doing, and that people tend to post content here from the people they follow.

> letting content rise based on its merit ought to be an HN goal

My point was that merit is earned, and people tend to attach weight to certain voices who have already earned it.

Don’t get me wrong. I’m not saying there are no downsides, and I said as much in the original comment.

HN regularly upvotes obscure content from people who are certainly not the center of a cult of personality. I was attempting to explain why I think this is more prevalent with AI and why I think that’s understandable in a landscape filled with slop.

gerdesj 6 days ago|

Oh FFS, "I've Chatted and stuff"

Your Exeter cavern quandary was not exactly sorted. https://simonwillison.net/2025/Sep/6/research-goblin/#histor...

They are quite old and very well documented, so how on earth could a LLM fuck up unless, a LLM is some sort of next token guesser ...

simonw 6 days ago|

Which bit are you talking about it failing to solve? The diagram of the tunnels?

I made fun of its attempt at drawing a useless scatter chart.

That example wasn't meant to illustrate that it's flawless - just that it's interesting and useful, even when it doesn't get to the ideal answer.