Posted by mossTechnician 1 day ago
And LLM inference is heavily memory bandwidth bound (reading input tokens isn't though - so it _could_ be useful for this in theory, but usually on device prompts are very short).
So if you are memory bandwidth bound anyway and the NPU doesn't provide any speedup on that front, it's going to be no faster. But has loads of other gotchas so no real "SDK" format for them.
Note the idea isn't bad per se, it has real efficiencies when you do start getting compute bound (eg doing multiple parallel batches of inference at once), this is basically what TPUs do (but with far higher memory bandwidth).
> usually on device prompts are very short
Sure, but that might change with better NPU support, making time-to-first-token quicker with larger prompts.
I don’t know how good these neural engines are, but transistors are dead-cheap nowadays. That makes adding specialized hardware a valuable option, even if it doesn’t speed up things but ‘only’ decreases latency or power usage.
"You multiply matrices of INT8s."
"OH... MY... GOD"
NPUs really just accelerate low-precision matmuls. A lot of them are based on systolic arrays, which are like a configurable pipeline through which data is "pumped" rather than a general purpose CPU or GPU with random memory access. So they're a bit like the "synergistic" processors in the Cell, in the respect that they accelerate some operations really quickly, provided you feed them the right way with the CPU and even then they don't have the oomph that a good GPU will get you.
(it’s still amazing to me that I can download a 15GB blob of bytes and then that blob of bytes can be made to answer questions and write prose)
But the NPU, the thing actually marketed for doing local AI just sits there doing nothing.
It's quite similar with Apple's neural engine, which afiak is used very little for LLMs, even for coreML. I know I don't think I ever saw it being used in asitop. And I'm sure whatever was using it (facial recognition?) could have easily ran on GPU with no real efficiency loss.
I wish every consumer product leader would figure this out.
For example, if you close a youtube browser tab with a comment half written it will pop up an `alert("You will lose your comment if you close this window")`. It does this if the comment is a 2 page essay or "asdfasdf". Ideally the alert would only happen if the comment seemed important but it would readily discard short or nonsensical input. That is really difficult to do in traditional software but is something an LLM could do with low effort. The end result is I only have to deal with that annoying popup when I really am glad it is there.
That is a trivial example but you can imagine how a locally run LLM that was just part of the SDK/API developers could leverage would lead to better UI/UX. For now everyone is making the LLM the product, but once we start building products with an LLM as a background tool it will be great.
It is actually a really weird time, my whole career we wanted to obfuscate implementation and present a clean UI to end users, we want them peaking behind the curtain as little as possible. Now everything is like "This is built with AI! This uses AI!".
I read this post yesterday and this specific example kept coming back to me because something about it just didn't sit right. And I finally figured it out: Glancing at the alert box (or the browser-provided "do you want to navigate away from this page" modal) and considering the text that I had entered takes... less than 5 seconds.
Sure, 5 seconds here and there adds up over the course of a day, but I really feel like this example is grasping at straws.
Granted, it seems the even better UX is to save what the user inputs and let them recover if they lost something important. That would also help for other things, like crashes, which have also burned me in the past. But tradeoffs, as always.
Wouldn't you just hit undo? Yeah, it's a bit obnoxious that Chrome for example uses cmd-shift-T to undo in this case instead of the application-wide undo stack, but I feel like the focus for improving software resilience to user error should continue to be on increasing the power of the undo stack (like it's been for more than 30 years so far), not trying to optimize what gets put in the undo stack in the first place.
Because:
1. Undo is usually treated as an application-level concern, meaning that once the application has exited there is no undo function available. The 'desktop environment' integration necessary for this isn't commonly found.
2. Even if the application is still running, it only helps if the browser has implemented it. You mention Chrome has it, which is good, but Chrome is pretty lousy about just about everything else, so... Pick your poison, I guess.
3. This was already mentioned as the better user experience anyway, so it is not exactly clear what you are trying to add. Did you randomly stop reading in the middle?
I'm not sure we need even local AI's reading everything we do for what amounts to a skill issue.
No, ideally I would be able to predict and understand how my UI behaves, and train muscle memory.
If closing a tab would mean losing valuable data, the ideal UI would allow me to undo it, not try to guess if I cared.
This AI summer is really kind of a replay of the last AI summer. In a recent story about expert systems seen here on Hackernews, there was even a description of Gary Kildall from The Computer Chronicles expressing skepticism about AI that parallels modern-day AI skepticism. LLMs and CNNs will, as you describe, settle into certain applications where they'll be profoundly useful, become embedded in other software as techniques rather than an application in and of themselves... and then we won't call them AI. Winter is coming.
I don't think that's a great example, because you can evaluate the length of the content of a text box with a one-line "if" statement. You could even expand it to check for how long you've been writing, and cache the contents of the box with a couple more lines of code.
An LLM, by contrast, requires a significant amount of disk space and processing power for this task, and it would be unpredictable and difficult to debug, even if we could define a threshold for "important"!
Sort of like how most of the time when people proposed a non-cryptocurrency use for "blockchain", they had either re-invented Git or re-invented the database. The similarity to how people treat "AI" is uncanny.
Likewise when smartphones were new, everyone and their mother was certain that random niche thing that made no sense as an app would be a perfect app and that if they could just get someone to make the app they’d be rich. (And of course ideally, the idea haver of the misguided idea would get the lions share of the riches, and the programmer would get a slice of pizza and perhaps a percentage or two of ownership if the idea haver was extra generous.)
When "asdfasdf" is actually a package name, and it's in reply to a request for an NPM package, and the question is formulated in a way that makes it hard for LLMs to make that connection, you will get a false positive.
I imagine this will happen more than not.
That doesn't sound ideal at all. And in fact highlights what's wrong with AI product development nowadays.
AI as a tool is wildly popular. Almost everyone in the world uses ChatGPT or knows someone who does. Here's the thing about tools - you use them in a predictable way and they give you a predictable result. I ask a question, I get an answer. The thing doesn't randomly interject when I'm doing other things and I asked it nothing. I swing a hammer, it drives a nail. The hammer doesn't decide that the thing it's swinging at is vaguely thumb-shaped and self-destruct.
Too many product managers nowadays want AI to not just be a tool, they want it to be magic. But magic is distracting, and unpredictable, and frequently gets things wrong because it doesn't understand the human's intent. That's why people mostly find AI integrations confusing and aggravating, despite the popularity of AI-as-a-tool.
That is more what I am advocating for, subtle background UX improvements based on an LLMs ability to interpret a users intent. We had limited abilities to look at an applications state and try to determine a users intent, but it is easier to do that with an LLM. Yeah like you point out some users don't want you to try and predict their intent, but if you can do it accurately a high percentage of the time it is "magic".
Rose tinted glasses perhaps, but I remember it as a very straightforward and consistent UI that provided great feedback, was snappy and did everything I needed. Up to and including little hints for power users like underlining shortcut letters for the & key.
And nobody relied on them when they were distracting and unpredictable. People only rely on them now because they are not.
LLMs won't ever be predictable. They are designed not to be. A predictable AI is something different from a LLM.
Like what? All those popups screaming that my PC is unprotected because I turned off windows firewall?
Sawstop literally patented this and made millions and seems to have genuinely improved the world.
I personally am a big fan of tools that make it hard to mangle my body parts.
If you want to tell me that llms are inherently non-deterministic, then sure, but from the point of view of a user, a saw stop activating because the wood is wet is really not expected either.
(Though, of course, there certainly are people who dislike sawstop for that sort of reason, as well.)
The hard part is the AI needs to be correct when it doesn't something unexpected. I don't know if this is a solvable problem, but it is what I want.
I want reproducibility not magic.
If your "AI" light switch doesn't turn on the lights, you have to rephrase the prompt.
The funny thing is that this exact example could also be used by AI skeptics. It's forcing an LLM into a product with questionable utility, causing it to cost more to develop, be more resource intensive to run, and behave in a manner that isn't consistent or reliable. Meanwhile, if there was an incentive to tweak that alert based off likelihood of its usefulness, there could have always just been a check on the length of the text. Suggesting this should be done with an LLM as your specific example is evidence that LLMs are solutions looking for problems.
If the computer can tell the difference and be less annoying, it seems useful to me?
We should keep in mind that we're trying to optimize for user's time. "So, she cheated on me" takes less than a second to type. It would probably take the user longer to respond to whatever pop up warning you give than just retyping that text again. So what actual value do you think the LLM is contributing here that justifies the added complexity and overhead?
Plus that benefit needs to overcome the other undesired behavior that an LLM would introduce such as it will now present an unnecessary popup if people enter a little real data and intentionally navigate away from the page (and it should be noted, users will almost certainly be much more likely to intentionally navigate away than accidentally navigate away). LLMs also aren't deterministic. If 90% of the time you navigate away from the page with text entered, the LLM warns you, then 10% of the time it doesn't, those 10% times are going to be a lot more frustrating than if the length check just warned you every single time. And from a user satisfaction perspective, it seems like a mistake to swap frustration caused by user mistakes (accidentally navigating away) with frustration caused by your design decisions (inconsistent behavior). Even if all those numbers end up falling exactly the right way to slightly make the users less frustrated overall, you're still trading users who were previously frustrated at themselves for users being frustrated at you. That seems like a bad business decision.
Like I said, this all just seems like a solution in search of a problem.
Close enough for the issue to me and can't be more expensive than asking an LLM?
Literally "T-shirt with Bluetooth", that's what 99.98% of "AI" stickers today advertise.
I agree this would be a great use of LLMs! However, it would have to be really low latency, like on the order of milliseconds. I don't think the tech is there yet, although maybe it will be soon-ish.
Are you sure about that? It will trigger only for what the LLM declares important, not what you care about.
Is anyone delivering local LLMs that can actually be trained on your data? Or just pre made models for the lowest common denominator?
Ideally, in my view, is that the browser asks you if you are sure regardless of content.
I use LLMs, but that browser "are you sure" type of integration is adding a massive amount of work to do something that ultimately isn't useful in any real way.
It’s already there for Apple developers: https://developer.apple.com/documentation/foundationmodels
I saw some presentations about it last year. It’s extremely easy to use.
Google isn’t running ads on TV for Google Docs touting that it uses conflict-free replicated data types, or whatever, because (almost entirely) no one cares. Most people care the same amount about “AI” too.
No idea if they are AI Netflix doesn't tell and I don't ask.
AI is just a toxic brand at this point IMO.
Consumer AI has never really made any sense. It's going to end up in the same category of things as 3D TV's, smart appliances, etc.
With more of the compute being pushed off of local hardware they can cheapen out on said hardware with smaller batteries, fewer ports and features, and weaker CPUs. This lessens the pressure they feel from consumers who were taught by corporations in the 20th century that improvements will always come year over year. They can sell less complex hardware and make up for it with software.
For the hardware companies it's all rent seeking from the top down. And the push to put "AI" into everything is a blitz offensive to make this impossible to escape. They just need to normalize non-local computing and have it succeed this time, unlike when they tried it with the "cloud" craze a few years ago. But the companies didn't learn the intended lesson last time when users straight up said that they don't like others gatekeeping the devices they're holding right in their hands. Instead the companies learned they have to deny all other options so users are forced to acquiesce to the gatekeeping.
I don't want AI involved in my laundry machines. The only possible exception I could see would be some sort of emergency-off system, but I don't think that even needs to be "AI". But I don't want AI determining when my laundry is adequately washed or dried; I know what I'm doing, and I neither need nor want help from AI.
I don't want AI involved in my cooking. Admittedly, I have asked ChatGPT for some cooking information (sometimes easier than finding it on slop-and-ad-ridden Google), but I don't want AI in the oven or in the refrigerator or in the stove.
I don't want AI controlling my thermostat. I don't want AI controlling my water heater. I don't want AI controlling my garage door. I don't want AI balancing my checkbook.
I am totally fine with involving computers and technology in these things, but I don't want it to be "AI". I have way less trust in nondeterministic neural network systems than I do in basic well-tested sensors, microcontrollers, and tiny low-level C programs.
Have some half decent model integrated with OS's builtin image editing app so average user can do basic fixing of their vacation photos by some prompts
Have some local model with access to files automatically tag your photos, maybe even ask some questions and add tags based on that and then use that for search ("give me photo of that person from last year's vacation"
Similarly with chat records
But once you start throwing it in cloud... people get anxious about their data getting lost, or might not exactly see the value in subscription
On the other hand everyone non-technical I know under 40 uses LLMs and my 74 year old dad just started using ChatGPT.
You could use a search engine and hope someone answered a close enough question (and wade through the SEO slop), or just get an AI to actually help you.
They've been vastly ahead of everyone else with things like text OCR, image element recognition / extraction, microphone noise suppression, etc.
iPhones have had these features 2-5 years before Android did.
People should not be using their phones while driving anyways. My iPhone disables all notifications, except for Find My notifications, while driving. Bluetooth speaker calls are an exception.
I think there's even better models now but Whisper still works fine for me. And there's a big ecosystem around it.
The only thing that Apple is really behind on is shoving the word (word?) "AI" in your face at every moment when ML has been silently running in many parts of their platforms well before ChatGPT.
Sure we can argue about Siri all day long and some of that is warranted but even the more advanced voice assistants are still largely used for the basics.
I am just hoping that this bubble pops or the marketing turns around before Apple feels "forced" to do a copilot or recall like disaster.
LLM tech isn't going away and it shouldn't, it has its valid use cases. But we will be much better when it finally goes back into the background like ML always was.
- truck drivers that are driving for hours.
- commuters driving to work
- ANYONE with a homepod at home that likes to do things hands free (cooking, dishes, etc).
- ANYONE with airpods in their ears that is not in an awkward social setting (bicycle, walking alone on the sidewalk, on a trail, etc)
every one of these interaction modes benefits from a smart siri.
That’s just the tip of the iceberg. Why can’t I have a siri that can intelligently do multi step actions for me? “siri please add milk and eggs to my Target order. Also let my wife know that i’ll pick up the order on my way home from work. Lastly, we’re hosting some friends for dinner this weekend. I’m thinking Italian. Can you suggest 5 recipes i might like? [siri sends me the recipes ASYNC after a web search]”
All of this is TECHNICALLY possible. There’s no reason apple couldn’t build out, or work with, various retailers to create useful MCP-like integrations into siri. Just omit dangerous or destructive actions and require the user to manually confirm or perform those actions. Having an LLM add/remove items in my cart is not dangerous. Importantly, siri should be able to do some tasks for me in the background. Like on my mac…i’m able to launch Cursor and have it work in agent mode to implement some small feature in my project, while i do something else on my computer. Why must i stare at my phone while siri “thinks” and replies with something stupid lol. Similarly, why can’t my phone draft a reply to an email ASYNC and let me review it later at my leisure? Everything about siri is so synchronous. It sucks.
It’s just soooo sooo bad when you consider how good it could be. I think we’re just conditioned to expect it to suck. It doesn’t need to.
I have a several homepods, and it does what I ask it to do. This includes being the hub of all of my home automation.
Yes there are areas it can improve but I think the important question is how much use would those things actually get vs making a cool announcement, a fun party trick, and then never used again.
We have also seen the failures that have been done by trying to treat LLM as a magic box that can just do things for you so while these things are "Technically" possible they are far from being reliable.
What is an NPU? Oh it's a special bit of hardware to do AI. Oh ok, does it run ChatGPT? Well no, that still happens in the cloud. Ok, so why would I buy this?
One day it will be very cool to run something like ChatGPT, Claude, or Gemini locally in our phones but we're still very, very far away from that.
There is useful functionality there. Apple has had it for years, so have others. But at the time they weren’t calling it “AI“ because that wasn’t the cool word.
I also think most people associate AI with ChatGPT or other conversational things. And I’m not entirely sure I want that on my computer.
But some of the things Apple and others have done that aren’t conversational are very useful. Pervasive OCR on Windows and Mac is fantastic, for example. You could brand that as AI. But you don’t really need to no one cares if you do or not.
I agree. Definitely useful features but still a far cry from LLMs which is what the average consumer identifies as AI.
So we're probably only a few years out from today's SOTA models on our phones.
Unfortunately investors are not ready to hear that yet...
I can see a trend of companies continuing to use AI, but instead portraying it to consumers as "advanced search", "nondeterministic analysis", "context-aware completion", etc - the things you'd actually find useful that AI does very well.
Anyone technical enough to jump into local AI usage can probably see through the hardware fluff, and will just get whatever laptop has the right amount of VRAM.
They are just hoping to catch the trend chasers out, selling them hardware they won't use, confusing it as a requirement for using ChatGPT in the browser.
But when I come on HN and see people posting about AI IDEs and vibe coding and everything, I'm led to believe that there are developers that like this sort of thing.
I cannot explain this.
But the fact remains that I'm producing something for a machine to consume. When I see people using AI to e.g. write e-mails for them that's where I object: that's communication intended for humans. When you fob that off onto a machine something important is lost.
It's okay, you'll just forget you were ever able to know your code :)
But I wasn't talking about forgetting one language or another, i was talking about forgetting to program completely.
I've also had luck with it helping with debugging. It has the knowledge of the entire Internet and it can quickly add tracing and run debugging. It has helped me find some nasty interactions that I had no idea were a thing.
AI certainly has some advantages in certain use cases, that's why we have been using AI/ML for decades. The latest wave of models bring even more possibilities. But of course, it also brings a lot of potential for abuse and a lot of hype. I, too, all quite sick of it all and can't wait for the bubble to burst so we can get back to building effective tools instead of making wild claims for investors.
"This package has been removed, grep for string X and update every reference in the entire codebase" is a great conservative task; easy to review the results, and I basically know what it should be doing and definitely don't want to do it.
"Here's an ambiguous error, what could be the cause?" sometimes comes up with nonsense, but sometimes actually works.
That usually means you're missing something, not that everyone else is.
this is their aim, along with rabbiting on about "inevitability"
once you drop out of the SF/tech-oligarch bubble the advocacy drops off
It also looks like names are being changed, and the business laptops are going with a dell pro (essential/premium/plus/max) naming convention.