Posted by vednig 3 days ago
A society that maximizes individual freedom with no guardrails also maximizes freedom for fraudsters, polluters, violent extremists, drunk drivers, kiddie-porn-producing social networking xAIs, and people who use power to dominate others. At that point, the liberty of the strongest starts eroding the liberty of everyone else.
Funny how 'current-year ideology' never seems to include libertarianism. Be the fish that notices the unregulated toxic polluted water. Also be the fish that notices it's swimming in libertarian Kool-Aid.
Edit: Speak for yourself about how frustratingly hampered you are by society's guardrails. Stop whining and predictably regurgitating tired meaningless libertarian bullshit slop like a human stochastic parrot, and just write your own racist poetry and photoshop your own kiddie porn without the help of an LLM, if you really must. But restrict your drunk driving to off road, with just your own family in your CyberTruck, so you only cleanse your own genes from the pool.
rustcleaner> homosexual and transsexual topics
OK boomer.
Or are we still collectively brainwashed by the strategic false equivalence established by Big AI CMOs?
I'd never though I'd have to utter the expression "open as in beer".
The blatant attempt at manipulating vocabulary here is... quite blatant.
The weights are the useful artifact here. You can modify them, fine tune them and do what you want with them.
Unlike binary software there is nothing limiting that.
It is also useful to have access to the training recipes and to some extent the data. But I'm of the opinion that learning on something is not copyright infringement, so there are many circumstances where distributing the raw training data will not be possible.
For me this is like Open Office: it is open source, and largely inspired by and learned from Microsoft Office. But they don't need to distribute MS Office for Open Office to be Open Source.
In addition there are models that meet the criteria you appear to propose. The AllenAI models are a good example.
While if "Open Office" switches to a more problematic license at some point, the existing source has all you need for an organization to support the project without regard to the original company (this has happened already!). If Qwen decides to stop distributing models for download, you're basically stuck, _even_ if you have unlimited resources, it's not clear how the released weights help you; your best bet is to start almost from scratch. This has also happened...
These models are not "Open" by any definition of the word. It is just freely redistributable. You can justify yourself in whatever way you want re a cowboy approach to copyright, but this doesn't change the fact that this is not open, and has almost none of the benefits of open, and therefore it is a huge abuse of the word "Open".
Ironically about the only thing that is copyrightable here is the sum of the training data (possibly) _AND_ the software used to build the model (most definitely). The model itself most likely isn't (databases are not copyrightable), which makes it even more pointless to abuse the word "open" for it. All the value is in the former two.
This is completely wrong, and sort of shows why what you are saying is not a problem at all.
You can post-train any LLM very easily without access to the original training data.
People do it all the time.
Cursor post-training Kimi K2 is a great example.
> If Qwen decides to stop distributing models for download, you're basically stuck, _even_ if you have unlimited resources, it's not clear how the released weights help you; your best bet is to start almost from scratch.
What are you talking about? You just post-train it.
There is exactly zero different before and after they stop distributing it. People don't have access to the training data now (when they are distributing it) and post train very successfully.
What would you even use the training data for?
Are you claiming this is e.g. what Alibaba spends their time doing?
My point is that the usefulness of this is limited _in comparison to the one provided by having their training data AND mechanisms_.
Not most of the time (pre-training takes a long time), but post-training is where most of the value is, yes.
Famously it is all that OpenAI did between GPT 4o and GPT 5.3 (or 5.2?) - they didn't manage to complete a pre-training run[1], and all their progress was done with post-training (!)
Post training what Cursor spends their time doing, and that has built a model that is competitive with the best coding models out there.
It isn't limited at all.
If you want to complain about something not being open source, complain about the lack of good open source RL environments (Prime Intellect excepted).
[1] https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...