Posted by walterbell 1 day ago
This seems contradictory to me. I suspect most experienced professionals start with the premise that the LLM is untrustworthy due to its nature. If they didn't research the tool and its limitations, that's lazy. At some point, they stopped believing in this limitation and offloaded more of their thinking to it. Why did they stop? I can't think of a single reason other than being lazy. I don't accept the premise that it's because the tool responded quickly, confidently, and clearly. It did that the first 100 times they used it when they were probably still skeptical.
Am I missing something?
Also, I won't remotely claim that it's the case here, but external pressures regularly push people into do the wrong thing. It doesn't mean anyone is blameless, but ignoring those pressures or the right (or wrong) stimuli makes it a lot harder to actually deal with situations like this.
Fair point. My intention isn't to be absolute, though. Even in a relative sense, I can't imagine a scenario where some level of laziness didn't contribute to the problem, even in the presence of external factors.
It seems like the author was eliminating laziness with their statement and instead putting the primary force on the LLM being "confident." This is what I'm pushing back against.
Most people don't actually critically evaluate LLMs for what they are, and actually buy into the hype that it's a super-intelligence.
In particular LLMs seem particularly good at passing the initial smell test, which I'd imagine is first line of defense for most on determining whether to trust info. And unless it is something critical most people probably wouldn't deem looking at sources worth while.
Lately I've been running many queries against multiple LLMs. Not as good as organic thinking but comparing two does at least involve a bit of judgement as to which set of info is superior. Probably not the most eco friendly solution....
It's not? Why not? It's a "wake-up call", it's a "warning shot", but heaven forbid it's a rant against AI.
To me it's like someone listing off deaths from fentanyl, how it's destroyed families, ruined lives, but then tossing in a disclaimer that "this isn't a rant against fentanyl". In my view, the ways that people use and are drawn into AI usage has all the hallmarks of a spiral into drug addiction. There may be safe ways to use drugs but "distribute them for free to everyone on the internet" is not among them.
Or what?
I'd say the better word for that is polarising than political, but they synonims these days.
I think my biggest concern with AI is its biggest proponents have the least wisdom imaginable. I'm deeply concerned that our technocrats are running full speed at AGI with like zero plan for what happens if it "disrupts" 50% of jobs in a shockingly short period of time, or worse outcomes (theres some evidence the new tariff policies were generated with LLMs.. its probably already making policy. But it could be worse. What happens when bad actors start using these things to intentionally gaslight the population?)
But I actually think AI (not AGI) as an assistant can be helpful.
Speaking of Wisdom and a different "AGI", I think there's an old Dungeons and Dragons joke that can be reworked here:
Intelligence is knowing than an LLM uses vector embeddings of tokens.
Wisdom is knowing LLMs shouldn't be used for business rules.
At individual perspective - AI is useful as a helper to achieve your generative tasks. I'd argue against analytic tasks, but YMMV.
At the societal perspective, e.g. you as individual can not trus anything the society has produced, because it's likely some AI generated bullshit.
Some time ago, if you were not trusting a source, you could build your understanding by evaluating a plurality of sources and perspectives and get to the answer in a statistical manner. Now every possible argument can be stretched in any possible dimension and your ability to build a conclusion has been ripped away.
A few thousand years of pre-LLM primary sources remain available for evaluation by humans and LLMs.
Reality/truth/history has always been an expensive pursuit in the face of evolving pollutants.
Every work had multiple versions. All versions were different. Some versions were diametrically opposed to others.
Have a look at Bible scholarship to see just _how_ divergent texts can become by nothing more than scribe errors.
I think you're right my analogy is imperfect. I'm only human (or am I? :P)
A better example would have been the complaint tablet to Ea-nāṣir. We're pretty sure it's real; there might still be people alive that remember it being discovered. But in a hundred years, people with gen AI have created museums of fake artifacts but plausible, can future people be sure? A good fraction of the US population today believes wildly untrue things about events happening in real time!
There are actually export statistics (obviously errors, possibly fraud) for these islands. Someone probably stuck the numbers in a formula without digging a little deeper.
It's probably the most sane aspect of the whole thing.
Russia, North Korea and handful of other countries were spared, likely because they sided with the US and Russia at the UN General Assembly on Feb 24 of this year, in voting against “Advancing a comprehensive, just and lasting peace in Ukraine.” https://digitallibrary.un.org/record/4076672
EDIT: Found it: https://nitter.net/krishnanrohit/status/1907587352157106292
Also discussed here: https://www.latintimes.com/trump-accused-using-chatgpt-creat...
The theory was first floated by Destiny, a popular political commentator. He accused the administration of using ChatGPT to calculate the tariffs the U.S. is charged by other countries, "which is why the tariffs make absolutely no fucking sense."
"They're simply dividing the trade deficit we have with a country with our imports from that country, or using 10%, whichever is greater," Destiny, who goes by @TheOmniLiberal on X, shared in a post on Wednesday.
> I think they asked ChatGPT to calculate the tariffs from other countries, which is why the tariffs make absolutely no fucking sense.
> They're simply dividing the trade deficit we have with a country with our imports from that country, or using 10%, whichever is greater. https://t.co/Rc45V7qxHl pic.twitter.com/SUu2syKbHS
> — Destiny | Steven Bonnell II (@TheOmniLiberal) April 2, 2025
He attached a screenshot of his exchange with the AI bot. He started by asking ChatGPT, "What would be an easy way to calculate the tariffs that should be imposed on other countries so that the US is on even-playing fields when it comes to trade deficit? Set minimum at 10%."
"To calculate tariffs that help level the playing field in terms of trade deficits (with a minimum tariff of 10%), you can use a proportional tariff formula based on the trade deficit with each country. The idea is to impose higher tariffs on countries with which the U.S. has larger trade deficits, thus incentivizing more balanced trade," the bot responded, along with a formula to use.
John Aravosis, an influencer with a background in law and journalism, shared a TikTok video that then outlined how each tariff was calculated; by essentially taking the U.S. trade deficit with the country divided by the total imports from that country to the U.S.
"Guys, they're setting U.S. trade policy based on a bad ChatGPT question that got it totally wrong. That's how we're doing trade war with the world," Aravosis proclaimed before adding the stock market is "totally crashing."
If we were headed straight to the AGI era then hey, problem solved - intelligent general machines which can advance towards solutions in a coherent if not human like fashion is one thing but that's not what AI is today.
AI today is enormously unreliable and very limited in a dangerous way - namely it looks more capable then it is.
If a human generates a story containing Count Dracula, that doesn't mean vampires are real, or that capabilities like "turning into a cloud of bats" are real, or that the algorithm "thirsts for the blood of the innocent."
The same holds when the story comes from an algorithm, and it continues to hold when story is about a differently-named character named "AI Assistant" who is "helpful".
Getting people to fall for this illusion is great news for the companies though, because they can get investor-dollars and make sales with the promise of "our system is intelligent", which is true in the same sense as "our system converts blood into immortality."
The false promises of the AI companies and the false expectations of the management and users.
Had it just recently for a data migration where the users asked if they still need to enter meta data for documents they just could use AI to query data that was usually based on that meta data.
They trust AI before it's even there and don't even consider a transition period where they check if the result are correct.
Like with security convenience prevails.
“It’ll change everything!” they said, as they continued to put money in their pockets as people were distracted by the shiny object.
NFTs didn't change much, money changed its owner
If your LLM + pre-prompt setup sounds confident with every response, something is probably wrong; it doesn't have to be that way. It isn't for me. I haven't collected statistics, but I often get decent nuance back from Claude.
Think more about what you're doing and experiment. Try different pre-prompts. Try different conversation styles.
This is not dismissing the tendency for overconfidence, sycophancy, and more. I'm just sharing some mitigations.
Ask on a Wednesday. During a full moon. While in a shipping container. Standing up. Keep a black box on your desk as the sacred GenAI avatar and pray to it. Ask while hopping on one leg.
The short answer is: you can know for a fact that it _isn't_ thinking more carefully because LLMs don't actually think at all, they just parrot language. LLMs are performing well when they are putting out what you want to hear, which is not necessarily a well thought out answer but rather an answer that LOOKS well thought out.
2. While the question of "is the AI thinking" is interesting, I think it is a malformed question. Think about it: how do you make progress on that question, as stated? My take: it is unanswerable without considerable reframing. It helps to reframe toward something measurable. Here, I would return to the original question: to what degree does an LLM output calibrated claims? How often does it make overconfident claims? Underconfident claims?
3. Pretending requires at least metacognition, if not consciousness. Agree? It is a fascinating question to explore how much metacognition a particular LLM demonstrates.
In my view, this is still a research question, both in terms of understanding how LLM architectures work as well as designing good evals to test for metacognition.
In my experience, when using chain-of-thought, LLMs can be quite good at recognizing previous flaws, including overconfidence, meaning that if one is careful, the LLM behaves as if it has a decent level of metacognition. But to see this, the driver (the human) must demonstrate discipline. I'm skeptical that most people prompt LLMs rigorously and carefully.
4. It helps discuss this carefully. Word choice matters a lot with AI discussions, much more than a even a relatively capable software developer / hacker is comfortable with. Casual phrasings are likely to lead us astray. I'll make a stronger claim: a large fraction of successful tech people haven't yet developed clear language and thinking about discussing classic machine learning, much less AI as a field or LLMs in particular. But many of these people lack the awareness or mindset to remedy this; they fall into the usual overconfidence or lack-of-curiosity traps.
5. You wrote: "LLMs are performing well when they are putting out what you want to hear."
I disagree; instead, I claim people, upon reflection, would prefer an LLM be helpful, useful, and true. This often means correcting mistakes or challenging assumptions. Of course people have short-term failure modes, such is human nature. But when you look at most LLM eval frameworks, you'll see that truth and safety matter are primary factors. Yes-manning or sycophancy is still a problem.
6. Many of us have seen the "LLMs just parrot language" claim repeated many times. After having read many papers on LLMs, I wouldn't use the words "LLMs just parrot language". Why? That phrase is more likely to confuse discussion than advance it.
I recommend this to everyone: instead of using that phrase, challenge yourself to articulate at least two POVs relating to the "LLMs are stochastic parrots" argument. Discuss with a curious friend or someone you respect. If it is just someone online you don't know, you might simply dismiss them out of hand.
The "stochastic parrot" phrase is fun and is a catchy title for an AI researcher who wants to get their paper noticed. But isn't a great phrase for driving mutual understanding, particularly not on a forum like HN where our LLM foundations vary widely.
Having said all this, if you want to engage on the topic at the object level, there are better fora than HN for it. I suggest starting with a literature review and finding an ML or AI-specific forum.
7. There is a lot of confusion and polarization around AI. We are capable of discussing better, but (a) we have to want to; (b) we have to learn now; and (c) we have to make time to do it.
Like I wrote in #6, above, be mindful of where you are discussing and the level of understanding of people around. I've found HN to be middling on this, but I like to pop in from time to time to see how we're doing. The overconfidence and egos are strong here, arguably stronger than the culture and norms that should help us strive for true understanding.
8. These are my views only. I'm not "on one side", because I reject the false dichotomy that AI-related polarization might suggest.
It does not help that his examples of things an imaginary LLM might miss are all very subjective and partisan too.
Of course, every ranter wants to be seen that way, and so a protest that something isn't a rant against X is generally a sign that it absolutely is a rant against X that the author is pre-emptively defending.
• Instead of validating sources, they assumed the AI had already done so.
• Instead of assessing multiple perspectives, they integrated and edited the AI’s summary and moved on.
These are point against certain actions with a tool not against the tool.
AI is for the starting point not the final result.
AI must never be the last step but it often is because people trust computers especially if they answer in a confident language.
It's the ELIZA effect all over again.
> The study revealed a clear pattern: the more confidence users had in the AI, the less they thought critically
And the study didn't even checked that. They just plotted the correlation between how much user think they rely on AI vs how much effort they think they saved. Isn't it expected to be positive even if they think as critically.
[1]: https://www.microsoft.com/en-us/research/wp-content/uploads/...
No one ever considers that maybe they all did lower our attention spans, prevent us from learning as well as we used to, etc. and now we are at a point we can't afford to keep losing intelligence and attention span
One of the famous Greek philosophers complained that books were hurting people's minds because they no longer memorized information, so this kind of complaint is as old as civilization itself. There is no evidence that we would be on Mars by now already if we had never invented books or television.
Seriously though, that's a horrible bowdlerization of the argument in the Phaedrus. It's actually very subtle and interesting, not just reactionary griping.
If I can solve two problems in a near constant time that is a few hours, what is the value of solving the problem which takes days to reason through?
I suspect that as the problem spaces diverge enough you’ll have two skill sets. Who can solve n problems the fastest and who can determine which k problems require deep thought and narrow direction. Right now we have the same group of people solving both.
Gell-Mann Amnesia. Attention span limits the amount information of information we can process and with attention spans decreasing, increases to information flow stop having a positive effect. People simply forget what they started with even if that contradicts previous information.
> If I can solve two problems in a near constant time that is a few hours, what is the value of solving the problem which takes days to reason through?
You don't end up solving the problem in near constant time, you end up applying the last suggested solution. There's a difference.
Just like this is a rant against irresponsible use of AI.
Hope this helps
You shall not criticize the profit!
People is why we can't have anything nice. It sucks.
I have medical reasons to take opioids, but in the eyes of people, I am a junkie. I would not be considered a junkie if I kept popping ibuprofen. It is silly. Opioids do not even make me high to begin with (it is complicated).
Or if not, then what, is it not true that both substances and AI can be used responsibly, and irresponsibly?
"People is why we can't have anything nice. It sucks." is also true, applies to many things, just consider vending machines alone, or bags in public (for dog poop) and anything of the sort. We no longer have bags anymore, because people stole it. A great instance of "this is why we can't have nice things". Pretty sure you can think of more.
Make the down-votes make sense, please.
(I do not care about the down-votes per se, I care about why I am being disagreed with without any responses.)
Now that we have thinking models and methodology to train them, surely before long it will be possible to have a model that is very good at the kind of thinking that an expert OSINT analyst knows how to do.
There are so many low hanging fruit applications of existing LLM strengths that have simply not been added to the training yet, but will be at some point.
A director of NSA, pre 9/11, once remarked that the entire organization produced about two pieces of actionable intelligence a day, and about one item a week that reached the President. An internal study from that era began "The U.S. Government collects too much information".
But that was from the Cold War era, when the intelligence community was struggling to find out basic things such as how many tank brigades the USSR had. After 9/11, the intel community had to try to figure out what little terrorist units with tens of people were up to. That required trolling through far too much irrelevant information.
There's definitely a metaphor to be made for trolling for data, that GP could have been intentionally making. I've certainly seen that idiom used before, although it could have been an eggcorn [2] for trawling.
[0] https://en.wikipedia.org/wiki/Trolling_(fishing)
[1] https://en.wikipedia.org/wiki/Troll_(slang)#Origin_and_etymo...
[0] https://samrawal.substack.com/p/the-human-ai-reasoning-shunt
A con man often uses the illusion of confidence to gain trust, though that's not the only way. The reverse also works: gain their trust by seeming unconfident and incapable, and thus easily taken advantage of.
- "Final FIXED & WORKING drawing.html" (it wasn't working at all)
- "Full, Clean, Working Version (save as drawing.html)" (not working at all)
- "Tested and works perfectly with: Chrome / Safari / Firefox" (not working at all)
- "Working Drawing Canvas (Vanilla HTML/JS — Save this as index.html)" (not working at all)
- "It Just Works™" (not working at all)
The last one was so obnoxious I moved over to Claude (3.5 Sonnet) and it knocked it out in 3-5 prompts.
They are much better at fractally subdividing and interpreting inputs like a believer of a religion, than at deconstructing and iteratively improving things like an engineert. It's waste of token count trying to have such discussions with an LLM.
Even if my prompt was low-quality, it doesn't matter. It's confidently stating that what it produced was both tested and working. I personally understand that's not true, but of all the safety guards they should be putting in place, not lying should be near the top of the list.
The idea that humans in general actually do any thinking is demonstrably false.
"But the tradecraft is slipping. Analysts are skipping the hard parts. They’re trusting GenAI to do the heavy cognitive lifting, and it’s changing how we operate at a foundational level."
Next we're going to be hearing about how participation trophies and DEI are also contributing to this imagined "problem."
For a second I thought you were talking about the fact we all have jobs doing exactly that!
Hopefully narrowed by team, role and task..
Besides "OSINT" has been busy posting scareware for years, even before "AI".
There's so much spam that you can't figure out what the real security issues are. Every other "security article" is about "an attacker" that "could" obtain access if you were sitting at your keyboard and they were holding a gun to your head.