Posted by CrankyBear 4 hours ago
So yeah, I think models on local hardware will be quite common soon among the tech savvy (such as people creating software).
E.g. grok isn't truly multi-modal, it has a callable tool that is a separate VLM it invokes on image URLs or files (for a long time it was grok-1.5v, but I think they have upgraded now, it was pretty bad).
And then you have the small summarizer models for the CoT/thought traces, the guidable summarizer models for the standard browse tools, etc.
There's a ton of stuff that can use an aging GPU.
I do hope you're right that it will get cheaper over time (it should), but right now 32GB of VRAM is not affordable to a lot of people. You're talking ~$4500 just for the GPU, or $800 ish used if you can find one.
It's a tad less efficient and a bit more of a hassle, but still a good experience for only a fraction of the price.
Gotta remember inflation here.
$1K in 1995 was roughly equivalent to $2K now and wouldn't have been a particularly "good" machine then.
In 1982 the Commodore 64 started at about $600 bucks, also roughly around $2K today.
If you outgrew that, beefier machines back then were A LOT. It was easy to find $2k+ towers and (especially) laptops even into the 2000s, and a lot of those would be $5K+ equivalent today.
I imagine having multiple providers competing will drive down hosted versions of open weight models drastically.
Certainly the transistors/chip or transistors/$ or flops/$ have not been progressing at the same exponential rate as during 1970-2010. There is still progress, but it's rather slower.
Especially because the world is likely to persist, at least for a while, in state where computing hardware demand drastically exceeds supply resulting in high prices for hardware. So why wouldn't you want to max out utilisation and amortize costs, at least for typical (non sensitive) use cases.
Possibly it's the same price range, allowing for inflation.
> It was only in 2025, as memory prices began an unprecedented surge, that the memory makers started to build new fabs targeted at HBM, all slated to start producing chips in 2027 or 2028.
If you want to argue that this is different from all previous RAM shortages, you can, but the burden of proof is on you to show the difference.
this time demand doesn't stop. there is an exponential demand for tokens.
Started with computers around 2009 and later bought an oldish computer (a pentium 4 PC) for the equivalent of 50 usd. Codeblocks and Python Idle were free at the time (C and Python were the first languages I learned). The barrier to programming has always been low as the only thing you needed was books (the internet made things easier) and access to a PC (I had friends with laptop and my school lab).
By all accounts, all AI companies starting with open are doing proprietary stuff. All models delivered for free as "open-models" are just freeware as no source is really provided.
Collaboratively trained open-weight models, is my understanding.
https://webtv.un.org/en/asset/k14/k14ej1ucqu?kalturaStartTim...
(if that link doesn't work, it starts about 12 minutes into the start)
I predict that mech interp and things like Neuronpedia will matter more and more over time, and the frontier providers are disincentivized from providing those tools
what has Linux won? Servers? sure
The bitter lesson would claim that we only need one model with more data thrown at it, but I’m a bit skeptical that we’ll end up with “the one true way” of building a model, and I think there will be model tradeoffs that we can pick from as the industry matures.
edit: come to think about it I think the ratio of one drop to one bucket is vastly over estimating the ratio of the trainer's effort.
> the investors who backed these foundational model companies who will hold the bag
Is awfully bold to assume that private credit is who will be holding the bag here. The IPOs are coming to shift the risk to the index funds & retail. Once insider lock up periods expire, I suspect a massive sell off.
> On the one hand you have—the point you’re making Woz—is that information sort of wants to be expensive because it is so valuable—the right information in the right place just changes your life. On the other hand, information almost wants to be free because the costs of getting it out is getting lower and lower all of the time. So you have these two things fighting against each other.
Information may want to be free, but the humans creating it still need to eat and pay rent. Copyright isn't necessarily unethical more than its a flawed tool, and lasts far too long in the law's current state. It needs to last only and exactly as long for the original creator to profit from the work for a specific duration of time, and then thats it.
It is only because of rampant greed and capitalism that information is not free. There is nothing inherent about the collective knowledge of mankind that lends itself to being proprietary and expensive. Otherwise human society literally could not have evolved.
I'd struggle to find an idea, art, technique etc... that wasn't an extension of something that came before it.
Preventing a handful of massive companies from continuing to be the only ones able to make money off that, not only unimpeded but with overt or covert state assistance (regulatory capture, ownership, whatever), at least puts an end to the worst of the abuse.
If we have broken the idea of copyright, and we do indeed appear to have broken the idea of copyright, why should trillion dollar companies owned and controlled by strange or psychopathic weirdos and their circle of investors be the only ones benefiting? Why do Sam and Dario or the US government get to decide when and for whom the tap is turned on?
Great analogy to the fear of the printing press being really bad news in that it enabled the rabble to get aroused.
All that's needed is another sovereign debt crisis to spark what is essentially dry tinder and I think the EU is a lot closer to collapsing than anyone even remotely realizes.
Even if it's too expensive to run the models on your own personal hardware, open weights may still make it possible to take power back from the big private corporations.
We need large scale open weights models just as capable as what's at the frontier.
And we need the ability to rent compute and spin up the weights easily. One-click, easy enough for anyone. Easier than nerd tools like ComfyUI, Claw, and node graph garbage.
Freedom is owning very large scale weights. Anything less is subsistence.
My hunch is that the energy/water usage of the data centers is a whole lot more efficient than everyone running at home, but I'd be interested in seeing real data on that.
So: if you're running the models on your own machine, presumably you're not running them as often, and air cooling is sufficient. But, at the same time, this is less efficient in terms of hardware use; the data centers need water cooling specifically because they're getting more bang from their buck from their hardware, by running their hardware harder.
So that's the tradeoff: more hardware-use efficiency means more water usage.
On the energy front, I assume less efficient, but I also think there is a tradeoff in efficiency versus freedom, that's why I have my own hardware.
This is the wrong approach that will turn us into serfs. We need big honking models that do what the leading foundation hyperscaler models do to within a few percentage points of measured performance.
The small-scale models are not productive, and the duct tape solutions built on top of them are hobbyist-tier "year of Linux on desktop" toys.
I imagine fedora-wearing, crypto-shilling, coupon-cutting boffins every time I see small weights thing lauded as the future. This is the Pine Phone F-Droid of AI.
"SMS works most of the time on my phone, I swear! I don't really need my banking app!"
That is not big model energy.
Nothing outside of the top ten is worth spending any time on, and we need to focus on models that bridge the gap.
You're talking about impractical toys for highly technical people wasting their own time. That doesn't move the needle or have any economic impact on the competitive landscape.
We need sharp teeth that bite at the legs of the top-tier foundation labs and hold them back from running away with the prize.
We've been through this time and time again over the last thirty years. It's the same shaped problem as before. We don't need toys - we need real infra for real people paying money to do work. Not freeware for freeloaders who don't spend and invest in the problem space.
Large models fit that precisely, because it forces investment into a wide variety of open infra, routers, inference engines, etc. Not to mention the weights ecosystem itself.
We need the right tool for the job. Certain models have minimum energy expense no matter what the task is and that's often wasted, both on the scale of some tasks and also repetition.
There is a place and a need for large models, local models, and single purpose models. The same way there is a need for HPC and single board.