The case against conversational interfaces

Posted by nnx 4/1/2025

The case against conversational interfaces(julian.digital)

279 points | 217 commentspage 2

DabeDotCom 4/1/2025|

> It was like they were communicating telepathically. > > That is the type of relationship I want to have with my computer!

The problem is, "The Only Thing Worse Than Computers Making YOU Do Everything... Is When They Do Everything *FOR* You!"

"ad3} and "aP might not be "discoverable" vi commands, but they're fast and precise.

Plus, it's easier to teach a human to think like a computer than to teach a computer to think like a human — just like it's easier to teach a musician to act than to teach an actor how to play an instrument — but I admit, it's not as scalable; you can't teach everyone Fortran or C, so we end up looking for these Pareto Principle shortcuts: Javascript provides 20% of the functionality, and solves 80% of the problems.

But then people find Javascript too hard, so they ask ChatGPT/Bard/Gemini to write it for them. Another 20% solution — of the original 20% is now 4% as featureful — but it solves 64% of the world's problems. (And it's on pace to consume 98% of the world's electricity, but I digress!)

PS: Mobile interfaces don't HAVE to suck for typing; I could FLY on my old Treo! But "modern" UI eschews functionality for "clean" brutalist minimalism. "Why make it easy to position your cursor when we spent all that money developing auto-conflict?" «sigh»

grbsh 4/1/2025|

I think we can have the best of both worlds here. We want the precision and speed of using vi commands, but we want the discoverability of GUI document editors. LLMs may be able to solve the discoverability problem. If the editor can be highly confident that you want to use a given a command, for example, it can give you an intellisense like completion option. I don't think we've cracked the code on how this UX should work yet though -- as evidenced by how many people find cursor/copilot autocompletion suggestions so frustrating.

The other great thing about this mode is that it can double as a teaching methodology. If I have a complicated interface that is not very discoverable, it may be hard to sell potential users on the time investment required to learn everything. Why would I want to invest hours into learning non-transferrable knowledge when I'm not even sure I want to go with this option versus a competitor? It will be a far better experience if I can first vibe-use the product , and if it's right for me, I'll probably be incented to learn the inner workings of it as I try to do more and more.

Izkata 4/1/2025||

> We want the precision and speed of using vi commands, but we want the discoverability of GUI document editors.

> The other great thing about this mode is that it can double as a teaching methodology.

gvim has menus and puts the commands in the menus as shortcuts. I learned from there vim has folding and how to use it.

earcar 4/1/2025||

Who's actually making the claim we should replace everything with natural language? Almost nobody serious. This article sets up a bit of a strawman while making excellent points.

What we're really seeing is specific applications where conversation makes sense, not a wholesale revolution. Natural language shines for complex, ambiguous tasks but is hilariously inefficient for things like opening doors or adjusting volume.

The real insight here is about choosing the right interface for the job. We don't need philosophical debates about "the future of computing" - we need pragmatic combinations of interfaces that work together seamlessly.

The butter-passing example is spot on, though. The telepathic anticipation between long-married couples is exactly what good software should aspire to. Not more conversation, but less need for it.

Where Julian absolutely nails it is the vision of AI as an augmentation layer rather than replacement. That's the realistic future - not some chat-only dystopia where we're verbally commanding our way through tasks that a simple button press would handle more efficiently.

The tech industry does have these pendulum swings where we overthink basic interaction models. Maybe we could spend less time theorizing about natural language as "the future" and more time just building tools that solve real problems with whatever interface makes the most sense.

mattmanser 4/1/2025|

I don't think it's a straw man, there's lots of people who think it might, or under vague impressions that it might. Plenty of less technical people. Because they haven't thought it through.

The article is useful as it's enunciated arguments which many of us have intuited, but are not necessarily able to explain ourselves.

nottorp 4/1/2025||

> because after 50+ years of marriage he just sensed that she was about to ask for it. It was like they were communicating telepathically.

> That is the type of relationship I want to have with my computer!

He means automation of routine tasks? Took 50 years to reach that in the example.

What if you want to do something new? Will the thought guessing module in your computer even allow that?

chongli 4/1/2025|

I don't know, but I feel like we already have the "telepathic grandfather interface." Or at least we try to have it. My iPhone is constantly guessing at things to suggest to me (I use the share button a lot in different apps) and it's wrong more often than not, forcing me to constantly hunt for things (to say nothing about autocorrect, which is constantly changing correct words that I'd previously typed into incorrect ones)! It doesn't even use a basic, sensible LRU eviction policy. It has some totally inscrutable method of determining what to suggest!

If we want an interface that actually lets us work near the speed of thought, it can't be anything that re-arranges options behind our back all the time. Imagine if you went into your kitchen to cook something and the contents of all your drawers and cupboards had been re-arranged without your knowledge! It would be a total nightmare!

We already knew decades ago that spatial interfaces [1] are superior to everything else when it comes to working quickly. You can walk into a familiar room and instinctively turn on a light by reaching for the switch without even looking. With a well-organized kitchen an experienced chef (or even a skilled home cook) can cook a very complicated dish very efficiently when they know where all of the utensils are so that they don't need to go hunting for everything.

Yet today it seems like all software is constantly trying to guess what we want and in the process ends up rearranging everything so that we never feel comfortable using our computers anymore. I REALLY miss using Mac OS 9 (and earlier). At some point I need to set up some vintage Macs to use it again, though its usefulness at browsing the web is rather limited these days (mostly due to protocol changes, but also due to JavaScript). It'd be really nice to have a modern browser running on a vintage Mac, though the limited RAM would be a serious problem.

[1] https://arstechnica.com/gadgets/2003/04/finder/

nottorp 4/1/2025|||

> With a well-organized kitchen an experienced chef (or even a skilled home cook) can cook a very complicated dish very efficiently when they know where all of the utensils are so that they don't need to go hunting for everything.

Even I can make a breakfast without looking in my kitchen, because I know where all the needed stuff is :)

On another topic, it doesn't have to look well organized. My home office looks like a bomb exploded in it, but I know exactly where everything is.

> I REALLY miss using Mac OS 9 (and earlier).

I was late to the Mac party, about the Snow Leopard days. I definitely remember that back then OS X applications weren't allowed to steal focus from what I had in the foreground. These days every idiotic splash screen steals my typing.

albertsondev 4/1/2025|||

This right here is probably my single biggest complaint with modern computing. It's a phenomenon I've taken to calling, in daily life, "tools trying to be too damn smart for their own good". I detest it. I despise it. Many of the evils of the modern state of tech--algorithmic feeds, targeted advertising, outwardly user-hostile software that goes incredible lengths to kneecap your own ability to choose how to use it--so, so much of it boils down to tools, things that should be extensions of their users' wills, being designed to "think" they know better what the user wants to do than the users themselves. I do not want my software, designed more often than not by companies with adversarial ulterior motives, to attempt to decide for me what I meant to watch, to listen to, to type, to use, to do. It flies in the face of the function of a tool, it robs people of agency, and above all else it's frankly just plain annoying having to constantly correct and work around these assumptions made based on spherical users in frictionless vacuums and tuned for either the lowest common denominator or whatever most effectively boosts some handful of corporate metrics-cum-goals (usually both). I want my computer to do what I tell it to, not what it (or rather, some bunch of brainworm-infested parasites on society locked in a boardroom) thinks I want to do. I can make exceptions for safety-critical applications. I do not begrudge my computer for requiring additional confirmation to rm -rf root, or my phone for lowering my volume when I have it set stupidly loud, or my car for having overly-sensitive emergency stop or adaptive cruise functions. These cases also all, crucially, have manual overrides. I can add --no-preserve-root, crank my volume right back up, and turn off cruise control and control my speed with the pedals. Forced security updates I only begrudge for their tendency to serve as a justification or cover for shipping anti-features alongside. Autocorrecting the word "fuck" out of my vocabulary, auto-suggesting niche music out of my listening, and auto-burying posts from my friends who don't play the game out of my communications are not safety-critical. Let computers be computers. Let them do what I ask of them. Let me make the effort of telling them what that is. Is that so much to ask>

rimeice 4/1/2025||

Individual UIs have been built for every product that has a UI with specific shortcuts and specific techniques you learn to use that tool. I don’t see why the same couldn’t apply for speech interfaces. The article does mention we haven’t figured out shortcuts like the thumbs up equivalent in speech yet but doesn’t explore that further. I can imagine specific words or combinations of words being used to control certain software that you have to learn. Eventually there would be some unification for common tasks.

Arainach 4/1/2025|

Speaking is fundamentally slower than typing or using a mouse, and it is a catastrophically bad choice if you are not alone in a room.

macleginn 4/1/2025||

I agree with some of the sentiments in the post, but I am somewhat surprised by the framing. Why make ‘a case’ against something that will clearly win or lose depending on adoption? Is the author suggestion that we should not be betting our money or resources on developing this? In this case we need more details for particular use cases, I would say.

fellerts 4/1/2025||

> To put the writing and speaking speeds into perspective, we form thoughts at 1,000-3,000 words per minute. Natural language might be natural, but it’s a bottleneck.

Natural language is very lossy: forming a thought and conveying that through speech or text is often an exercise in frustration. So where does "we form thoughts at 1,000-3,000 words per minute" come from?

The author clearly had a point about the efficiency of thought vs. natural language, but his thought was lost in a layer of translation. Probably because thoughts don't map cleanly onto words: I may lack some prerequisite knowledge to graph what the author is saying here, which pokes at the core of the issue: language is imperfect, so the statement "we form thoughts at 1,000-3,000 words per minute" makes no sense to me.

Meta-joking aside, is "we form thoughts at 1,000-3,000 words per minute" an established fact? It's oddly specific.

paulluuk 4/1/2025|

I'm also curious about this -- I'm pretty sure that I think actual words at about the speed at which I can speak them. I can not speak 3000 words per minute.

I also have my doubts about the numbers put forward on reading, listening and speaking. When reading, again I can read words about as fast as I can speak words. When I'm reading, I am essentially speaking out the words but in my mind. Is that not how other people read?

fellerts 4/1/2025|||

It sounds like you have a strong inner monologue. Some people do, some don't. I don't subvocalize (no inner voice when reading). Words aren't involved when I think about stuff. I don't have an inner "voice" at all, only when I'm trying to communicate. Maybe I need to more "translating" from thought to voice than you do?

This stuff is fascinating.

whatevertrevor 4/1/2025|||

Nope. Plenty people don't have an internal monologue, and even if they do it's not on all the time.

For me, when I need to think clearly about a specific/novel thing, a monologue helps, but I don't voice out thoughts like "I need a drink right now".

Also I read much faster than I speak, I have to slow down while reading fiction as a result.

3l3ktr4 4/1/2025||

I disagree with the author when they say something along the lines of “why don’t we use buttons instead of using these new assistive technology? Buttons are much faster, and I proved humans like fast.” I think that’s false. Why after 10 years of software development I haven’t learned EMACS? Because I’m lazy, because I don’t think it’s the bottleneck of my work. My bottleneck might be creativity or knowledge and conversational interfaces might be the best thing there are for these (in the lack of a knowledgeable and kind human, which the author also seems to agree with). Anyway, I don’t know, I found the title a bit disconnected from the content and the conclusions a bit overlappingly confusing but this is a complicated question. In the end I agree that we want a mix of things, we want a couple of keyboard strokes and we want chats. But most of all we probably want direct brain interface! ;)

eviks 4/1/2025||

> but we’ve never found a mobile equivalent for keyboard shortcuts. Guess why we still don’t have a truly mobile-first productivity app after almost 20 years since the introduction of the iPhone?

Has it even been tried? Is there an iPhone text editing app with fully customizable keyboard that allows for setting up modes/gestures/shortcuts, scriptable if necessary?

> A natural language prompt like “Hey Google, what’s the weather in San Francisco today?” just takes 10x longer than simply tapping the weather app on your homescreen.

That's not entirely fair, the natural language could just as well be side button + saying "Weather" with the same result, though you can make app availability even easier by just displaying weather results on the homescreen without tapping

walterbell 4/1/2025|

Blackberry physical keyboard had many shortcuts, https://defkey.com/blackberry-10-classic-shortcuts

iPad physical keyboards also have shortcuts.

eviks 4/1/2025||

These are both desktop equivalents using an actual desktop keyboard or a mini variant thereof

walterbell 4/1/2025||

Why is Blackberry a desktop equivalent? It preceded iPhone by many years, with unique workflows that varied by model.

eviks 4/1/2025||

Because it's literally a physical=desktop keyboard, just smaller in size while almost all current mobile interfaces are touch based?(also, the question wasn't about uniqueness, but productivity levels of a desktop productivity app, think about code editors with extensions, keyboard and mouse gesture customization.

What did they have in their touch interfaces?

walterbell 4/1/2025||

For most of their existence, Blackberry had no touch interface. One appeared in later versions as they tried to compete with Android and iPhone. One example of a "mobile keyboard" shortcut was long pressing a physical key to launch a specific function.

It might be hard to understand now, but Blackberry power users could be much more productive with email/texting than any phone that exists today. But they were special purpose 2-way radio (initially, pager) devices that lacked the flexibility of modern apps with full internet data access.

gatinsama 4/1/2025|

It is a huge turnoff for me when futuristic series use conversational interfaces. It happened in the Expanse and was hard to watch. For anyone who likes to think, learn, and tinker with user interfaces (HCI in general), it's obviously a high-latency and noisy channel.

internet_points 4/1/2025||

I actually found that quite reasonable. E.g. they were using it to sort and filter data, just like people today use llm's to write their R script and (avoid having to) figure out how to invoke gnuplot. I'm sure somewhere in that computer it's still invoking gnuplot under a century of vibe-coded moldy spaghetti code =P

I don't remember where else they used voice, they had a lot of other interface types they switched between. Tried searching for a clip and found this quote:

    > The voice interface had been problematic from the start. 
    > The original owner was Chinese so, I turned the damn thing off.

So yes, quite realistic :-)

woile 4/1/2025||

I think the expanse nails it quite well. I really like when they move the videos from one screen to another. Or when they interact with the ship, they use all kind of outputs, voice, screens, buttons. For planning together, they talk and the machine renders, but then they have screens or even bracelets to interact.

More comments...