Can I run AI locally?

Posted by ricardbejarano 8 hours ago

476 points | 127 commentspage 2

carra 4 hours ago|

Having the rating of how well the model will run for you is cool. I miss to also have some rating of the model capabilities (even if this is tricky). There are way too many to choose. And just looking at the parameter number or the used memory is not always a good indication of actual performance.

Felixbot 4 hours ago||

[flagged]

itigges22 2 hours ago|

[flagged]

JulianPembroke 2 hours ago||

[flagged]

phelm 5 hours ago||

This is awesome, it would be great to cross reference some intelligence benchmarks so that I can understand the trade off between RAM consumption, token rate and how good the model is

freediddy 3 hours ago||

i think the perplexity is more important than tokens per second. tokens per second is relatively useless in my opinion. there is nothing worse than getting bad results returned to you very quickly and confidently.

ive been working with quite a few open weight models for the last year and especially for things like images, models from 6 months would return garbage data quickly, but these days qwen 3.5 is incredible, even the 9b model.

sroussey 3 hours ago|

No, getting bad results slowly is much worse. Bad results quickly and you can make adjustments.

But yes, if there is a choice I want quality over speed. At same quality, I definitely want speed.

cafed00d 3 hours ago||

Open with multiple browsers (safari vs chrome) to get more "accurate + glanceable" rankings.

Its using WebGPU as a proxy to estimate system resource. Chrome tends to leverage as much resources (Compute + Memory) as the OS makes available. Safari tends to be more efficient.

Maybe this was obvious to everyone else. But its worth re-iterating for those of us skimmers of HN :)

bearjaws 1 hour ago||

So many people have vibe coded these websites, they are posted to Reddit near daily.

rcarmo 2 hours ago||

This is kind of bogus since some of the S and A tier models are pretty useless for reasoning or tool calls and can’t run with any sizable system prompt… it seems to be solely based on tokens per second?

SXX 2 hours ago||

Sorry if already been answered, but will there be a metric for latency aka time to first token?

Since I considered buying M3 Ultra and feel like it the most often discussed regarding using Apple hardware for runninh local LLMs. Where speed might be okay, but prompt processing can take ages.

teaearlgraycold 2 hours ago|

Wait for the M5 Ultra. It will get the 4x prompt processing speeds from the rest of the M5 product line. I hear rumors it will be released this year.

A7OM 2 hours ago|

[flagged]

More comments...