Posted by vednig 9 hours ago
If this is a serious concern, why hasn't some red teaming effort demonstrated this possibility already? The fact of the matter is that ablation can't give a model world knowledge it doesn't have as part of training, it can only make the model confabulate. The "nasty" areas of concern are most notable for their world-knowledge requirements, which is where local models are at their weakest anyway.
I'm sure they have but as usual we are a reactive society than proactive. Only when incident has occurred then we have momentum to act.
I always wondered if 1000 1M parameter models fine-tuned to specific tasks with a small router could perform as well as 100B models.
And I know this is roughly how MoE works, but current MoE models still require training the model as a whole, and big players don’t have an incentive to change that.
But OpenSource community does…
to me Open Source, like Free Software, is something i can run on my own computer. any AI system that runs on a computer that i do not control is by my definition not Open Source.
so how then can Open Source AI win? it can't even compete. even if we collect enough money and create a dedicated Open Source organization to build and run a community owned AI datacenter, how does that help?
so what exactly is the demand here?
Right now there a few people who can run a 1T model at home, even less who can run a 5T model and probably single digits who can run a 10T model.
But if an open source 10T model was available you can be sure people would find new ways to quantize it, new ways to configure hardware and and new ways to think about problems that would make it useful.
1T+ models (Deepseek v4, Kimi K2.6 etc) are available as open weights now, and for ~$5000-$10000 you can run them usefully at home. 2 years ago no on was contemplating that.
$250K to run a 10T model might be possible now. There are many companies that will pay that, and that will push the tools and techniques downwards for the rest of us.
This is not true at all. It would be open source if you could download it and run it anywhere that is capable, and are free to move it and modify it as much as you want.
Just because you don't have a computer at home powerful enough doesn't mean it isn't open source.
Fun fact: Qwen was not initially a Apache Licensed project, it was based on a custom license from Alibaba that restricts commercial use: https://github.com/QwenLM/Qwen/blob/ba2d85a13b28ed1ee0dde2d6.... There's no guarantee that they won't just switch it back later.
Kudos for them for switching to Apache License, of course. BUT, they're still a for-profit company. So as DeepSeek btw.
Never, ever, subscribe. When you subscribe, they win. They cornered the silicon market to force you to subscribe. Don't be a sub, or at least keep your sub tendencies in the bedroom. ;^)
But I am going to need a much beefier machine to get it to the point where it can do any but very trivial dev tasks acceptably fast, and I'm going to need a much beefier model, perhaps one not so aggressively quantized, to keep it on task without the wheels completely falling off. Already we're talking serious money outlay, perhaps still within my programmer salary to accommodate, but just barely. And we're not even where near the performance characteristics a frontier model can support.
Qwen 2.5 72B is surprisingly capable, almost on par with GPT-4o if not a little better. You can run it on a 128GB Mac Studio with 8-bit quantization. You need about 77GB for the weights and ~15GB for your context window & cache.
Pricing remains to be seen, but there's also those new nvidia laptops coming out the surface laptop ultra should have 128GB RAM w/ Blackwell GPU, they're saying 1 petaflop of AI compute, if you can tolerate Windows (no idea if it'll boot Linux until the hardware is out).
These models are roughly ~1 year or less behind the frontier models. We really just need hardware to catch up and alleviate the price pressure on RAM.
Maybe an unpopular opinion here (seening how Y-combinator is his baby), but I think OpenAI and Sam Altman should be financially decimated for cornering the DRAM market. What he's done is a step or two removed from what the Hunt brothers did. His buy-up of future DRAM silicon has measurably harmed personal computing, and he should not get to walk away with a 'win' from it.
I don’t think so. A local run model only needs to serve one or a few people. It seems possible to run a DeepSeek v4 model at full capacity on a server costing 200k usd. Very expensive but not impossible.
Factor in hardware and software improvements over time, and the fact that most people may just need to run a smaller and quantized model, it should take a pc at 10k usd scale.
One day an open source model reaches "good enough" level. Maybe around the level the current frontier has and most people will use that
You can one-shot a port of Linux to Rust and stop contributing to open source.
The value of software is going to tend towards zero. The value of the software developer the same.
Anthropic is now a kingmaker. It gets to decide which businesses get the expensive private model that can generate entire business functions at the drop of a hat. If you can't afford the price tag, then competition in the market is not for you.
Computing is no longer "personal". It's for big biz only.
Touch grass brother. Seriously.