Open source AI must win

Posted by vednig 3 days ago

Open source AI must win(opensourceaimustwin.com)

1593 points | 479 commentspage 15

gnarlouse 3 days ago|

[flagged]

devkakadiya 3 days ago||

[flagged]

devkakadiya 3 days ago||

[flagged]

rustcleaner 3 days ago||

[flagged]

dyauspitr 3 days ago||

Yeah perfect, the youth are already degenerates. Let’s put an always on demon in their room to be their life coach. I got off the libertarian train a long time ago, it’s as stupid as anarchists.

DonHopkins 3 days ago||

Instead, how about an open source robot that happily punches Nazis, drunk drivers, people who want to write racist poems, and unhinged trillionaire ketamine addicts who pee their pants, throw Nazi salutes, and hate their own trans daughters?

rustcleaner 3 days ago||

[flagged]

DonHopkins 3 days ago||

Then perhaps it should also punch foaming at the mouth libertarians, too.

A society that maximizes individual freedom with no guardrails also maximizes freedom for fraudsters, polluters, violent extremists, drunk drivers, kiddie-porn-producing social networking xAIs, and people who use power to dominate others. At that point, the liberty of the strongest starts eroding the liberty of everyone else.

Funny how 'current-year ideology' never seems to include libertarianism. Be the fish that notices the unregulated toxic polluted water. Also be the fish that notices it's swimming in libertarian Kool-Aid.

Edit: Speak for yourself about how frustratingly hampered you are by society's guardrails. Stop whining and predictably regurgitating tired meaningless libertarian bullshit slop like a human stochastic parrot, and just write your own racist poetry and photoshop your own kiddie porn without the help of an LLM, if you really must. But restrict your drunk driving to off road, with just your own family in your CyberTruck, so you only cleanse your own genes from the pool.

DonHopkins 3 days ago|||

Also:

rustcleaner> homosexual and transsexual topics

OK boomer.

rustcleaner 3 days ago|||

[flagged]

CharlesW 3 days ago|

Can we assume that the author isn't using "Opensource" to mean "Openweights"?

Or are we still collectively brainwashed by the strategic false equivalence established by Big AI CMOs?

AshamedCaptain 3 days ago|

On this very thread you already have people talking about "open weights" and similar nonsense. What is open about them? They're free to download, but that hardly qualifies as open. Where is the source? Where are the instructions to modify and build your own?

I'd never though I'd have to utter the expression "open as in beer".

The blatant attempt at manipulating vocabulary here is... quite blatant.

nl 3 days ago|||

I'm a strong proponent of Open Source (TM) but I disagree with this take.

The weights are the useful artifact here. You can modify them, fine tune them and do what you want with them.

Unlike binary software there is nothing limiting that.

It is also useful to have access to the training recipes and to some extent the data. But I'm of the opinion that learning on something is not copyright infringement, so there are many circumstances where distributing the raw training data will not be possible.

For me this is like Open Office: it is open source, and largely inspired by and learned from Microsoft Office. But they don't need to distribute MS Office for Open Office to be Open Source.

In addition there are models that meet the criteria you appear to propose. The AllenAI models are a good example.

AshamedCaptain 3 days ago||

The analogy falls apart very quickly. Without the training data, your modifications amount to virtually nothing compared to what these "versions" are, and the idea that you can maintain and improve on these models without the continual support of the company that owns the training data AND harnesses AND in general build instructions is not very credible. This is why it's not rare that they "dump" old versions as freeware but at some point switch to not distributing them, and mostly get away with it. As this is really not open, and the threat of an effective fork is therefore non-existent, the pressure for any one who has released freeware models to "go SaaS" is too high.

While if "Open Office" switches to a more problematic license at some point, the existing source has all you need for an organization to support the project without regard to the original company (this has happened already!). If Qwen decides to stop distributing models for download, you're basically stuck, _even_ if you have unlimited resources, it's not clear how the released weights help you; your best bet is to start almost from scratch. This has also happened...

These models are not "Open" by any definition of the word. It is just freely redistributable. You can justify yourself in whatever way you want re a cowboy approach to copyright, but this doesn't change the fact that this is not open, and has almost none of the benefits of open, and therefore it is a huge abuse of the word "Open".

Ironically about the only thing that is copyrightable here is the sum of the training data (possibly) _AND_ the software used to build the model (most definitely). The model itself most likely isn't (databases are not copyrightable), which makes it even more pointless to abuse the word "open" for it. All the value is in the former two.

nl 1 day ago||

> The analogy falls apart very quickly. Without the training data, your modifications amount to virtually nothing compared to what these "versions" are, and the idea that you can maintain and improve on these models without the continual support of the company that owns the training data AND harnesses AND in general build instructions is not very credible.

This is completely wrong, and sort of shows why what you are saying is not a problem at all.

You can post-train any LLM very easily without access to the original training data.

People do it all the time.

Cursor post-training Kimi K2 is a great example.

> If Qwen decides to stop distributing models for download, you're basically stuck, _even_ if you have unlimited resources, it's not clear how the released weights help you; your best bet is to start almost from scratch.

What are you talking about? You just post-train it.

There is exactly zero different before and after they stop distributing it. People don't have access to the training data now (when they are distributing it) and post train very successfully.

What would you even use the training data for?

AshamedCaptain 1 day ago||

> You can post-train any LLM very easily without access to the original training data.

Are you claiming this is e.g. what Alibaba spends their time doing?

My point is that the usefulness of this is limited _in comparison to the one provided by having their training data AND mechanisms_.

nl 21 hours ago||

> what Alibaba spends their time doing?

Not most of the time (pre-training takes a long time), but post-training is where most of the value is, yes.

Famously it is all that OpenAI did between GPT 4o and GPT 5.3 (or 5.2?) - they didn't manage to complete a pre-training run[1], and all their progress was done with post-training (!)

Post training what Cursor spends their time doing, and that has built a model that is competitive with the best coding models out there.

It isn't limited at all.

If you want to complain about something not being open source, complain about the lack of good open source RL environments (Prime Intellect excepted).

[1] https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

cortesoft 3 days ago||||

What would the 'source' be for an LLM? There is the structure, and the weights, there is no 'source'.

CharlesW 3 days ago||

In case you're not just trolling, please learn how "the weights", which are analgous to a compiled executable, are made.

cortesoft 1 day ago||

The weights are created through training. The 'source' would be the training data, which is going to be a massive amount of data, and is not something that could just be easily shared.

AshamedCaptain 1 day ago||

And?

singpolyma3 3 days ago|||

There is no source because it's not software. You can of course modify and make your own.