Posted by simonw 7 days ago
This may nudge me to start using chatbots more for this type of queries. I usually use Perplexity or Kagi Assistant instead.
Simon, what's your opinion on doing the same with other frontier systems (like Claude?), or is there something specific to ChatGPT+GPT5?
I also like the name, nicely encodes some peculiarities of tech. Perhaps we should call AI agents "Goblins" instead.
ChatGPT: Gave me the planning proposal number of the final design, with instructions to the government website where I can plug that in to get floor plans and other docs.
In this case, ChatGPT was so much better at giving me a definitive source than Google - instead of the other way around.
I looked into years ago and couldn't find a satisfying answer, but GPT-5 went off, did an "unreasonable" amount of research, cross referenced sources and provided an incredibly detailed answer and a believable range.
Recently, it started returning even more verbose answers. The absolute bullshit research paper that Google Gemini gives you is what turned me away from using it there. Now, chatGPT also seems to go for more verbose filler rather than actually information. It is not as bad as Gemini, but I did notice.
It makes me wonder if people think the results are more credible with verbose reports like that. Even if it actually obfuscates the information you asked it to track down to begin with.
I do like how you worded it as a believable range, rather than an accurate one. One of the things that makes me hesitant to use deep research for anything but low impact non-critical stuff is exactly that. Some answers are easier to verify than others, but the way the sources are presented, it isn't always easy to verify the answers.
Another aspect is that my own skills in researching things are pretty good, if I may so myself. I don't want them to atrophy, which easily happens with lazy use of LLMs where they do all the hard work.
A final consideration came from me experimenting with MCPs and my own attempts at creating a deep research I could use with any model. No matter the approach I tried, it is extremely heavy on resources and will burn through tokens like no other.
Economically, it just doesn't make sense for me to run against APIs. Which in my mind means it is heavily subsidized by openAI as a sort of loss-leader. Something I don't want to depend on just to find myself facing a price hike in the future.
Putting it together (podcast‑only)
Ads (STM only): 0.85 – $0.95M
Memberships: $1.5–$2.6M/yr
Working estimate (gross, podcast‑only): $2.3M – $3.7M per year for ads + its share of memberships. Mid‑case lands near $2.9M/yr gross; after typical platform/processing fees and less‑than‑perfect ad sell‑through, a net in the low‑to‑mid $2Ms seems plausible.
---
Quick answers
How many listeners: 175K downloads per episode, ~1.4–2.1M monthly downloads, demographics skew U.S., median age ~36.
How much revenue? $2.3M– $3.7M/yr gross from ads + memberships attributable to the STM show
It usually looks at less than 20 sources. Basically, it has become useless.
Conversely, just doing thinking or pro mode gives me hundreds of sources
What gives?
It's going to take a minute, so why do I need to keep looking at it and can't go read some more Wikipedia in the mean time?
This is insanely user hostile. Is it just me who encounters this? I'm on Plus plan on Android. Maybe you don't get this with Pro?
Here's a screenshot of what I mean: https://imgur.com/a/9LZ1jTI
It even shows me a push notification at the top of my screen when the search task has finished.
Insane ratio of "app quality" to "magic technology". The models are wild (as someone in the AI mix for the last 20 years or so) and the mobile app and codex integrations are hot garbage.
But I’ve found that no matter the error - even if I disconnect from the internet entirely - I eventually get a push notification and opening up the thread a while later shows me the full response. (disclaimer: N=1)
One of the complications of your average query taking at least some number of seconds to complete - that is, long enough for the user to do something else while waiting.
Is that the goal? Send this thing off on a wild goose chase, and hope it comes back with the right answer no matter the cost?
I'd rather GDP be $1B smaller right now if it meant that Newton had spent another 30 years on physics and math.