Breaking the Llama Community License

Posted by mkl 4/13/2025

Breaking the Llama Community License(notes.victor.earth)

144 points | 81 comments

lolinder 4/16/2025|

This is why I've always considered the weights-vs-source debate to be an enormous red herring that skips the far more important question: are the weights actually "Open" in the first place?

If Llama released everything that the most zealous opponents of weights=source demand they release under the same license that they're currently offering the weights under, we'd still be left with something that falls cleanly into the category of Source Available. It's a generous Source Available, but removes many of the freedoms that are part of both the Open Source and Free Software Definitions.

Fighting over weights vs source implicitly cedes the far more important ground in the battle over the soul of FOSS, and that will have ripple effects across the industry in ways that ceding weights=source never would.

SahAssar 4/16/2025||

I don't think most people in the weights-vs-source debate misunderstands this, it's just the the current "open-source" models for the most part do not even meet the bar of source-available, so talking about if the license is actually Open is not the current discussion.

lolinder 4/16/2025||

See, but my point is that this is putting the cart before the horse. The "Open" in "Open Source" is what matters most by far, the same way that the "Free" in "Free Software" is the key word that qualifies the kind of software we're taking about.

Once we've resolved the problem of using the word "Open" incorrectly I'm happy to have a conversation about what should be the preferred form for modification (i.e. source) of an LLM. But that is the less important and far more esoteric discussion (and one about which reasonable people can and do disagree), to the point where it's merely a distraction from the incredibly meaningful and important problem of calling something "Open Source" while attaching an Acceptable Use policy to it.

achierius 4/16/2025||

> The "Open" in "Open Source" is what matters most by far, the same way that the "Free" in "Free Software" is the key word that qualifies the kind of software we're taking about.

I don't think this is true. If someone said "look, my software is open source" and by "source" they meant the binary they shipped, the specific definition of "open" they chose to use would not matter much for the sort of things I'd like to do with an open source project. Both are important.

lolinder 4/16/2025|||

I agree that both matter, but one is more important than the other.

If they released the binary as "Open Source" but had a long list of things I wasn't allowed to do with it, the fact that they didn't release the source code would be of secondary concern to the fact that they're calling it "Open" while it actually has a trail of legal landmines waiting to bite anyone who tries to use it as free software.

And that's with a clear cut case like a binary-only release. With an LLM there's a lot of room for debate about what counts as the preferred form for making modifications to the work (or heck, what even counts as the work). That question is wide open for debate, and it's not worth having that debate when there's a far more egregious problem with their usage.

int_19h 4/17/2025||||

The catch is that the benefits of open vs non-open don't translate neatly from software to models. If software is binary-only, is it exceedingly difficult to change it in any kind of substantial way (you can change the machine code directly, of course, but the very nature of the format makes this very limited). OTOH with a large language model with open weights but without open training data - the closest equivalent to open source for software - you can still change its behavior very substantially with finetuning or remixing layers (from different models even!).

chme 4/17/2025||

> OTOH with a large language model with open weights but without open training data - the closest equivalent to open source for software - you can still change its behavior very substantially with finetuning or remixing layers (from different models even!).

The closest thing to open source would be to have open training data. The weights are the binary, the training date is the source and the process of getting the weights is the compilation process.

Fintuning or whatever is just modding the binaries. Remixing different layers is creating a workflow pipeline by combining different functions of a binary software package together with components from other binary software packages.

NitpickLawyer 4/17/2025|||

> and by "source" they meant the binary they shipped

Common misconception. Weights are not binary. Weights are hardcoded values that you load into an (open-source or closed-source) engine and you run that engine. The source code for LLMs is both in the architecture (i.e. what to do with those hardcoded values) and the inference engines.

As opposed to binaries, you can modify weights. You can adjust them, tune them for downstream tasks and so on. And more importantly, in theory you the downloader and "company x" the releaser of the model use the same methods and technologies to modify the weights. (in contrast to a binary release where you can only modify only the machine language while the creator can modify the source-code and recompile).

LLamas aren't open source because the license under which they're released isn't open source. There are plenty of models that are open source tho: mistrals (apache 2.0), qwens (apache2.0), deepseeks (mit), glms (mit) and so on.

hakre 4/17/2025||

what you describe reminds me pretty much of a binary blob that is loaded into a machine or software.

additionally modifying data in binary form was a longtime practice last time I looked, but I might not remember correctly.

fragmede 4/17/2025||

In today's world, if Meta did release full source they used to create Llama, there are only about a dozen institutions that have the capacity to actually do anything with that, and no one has that kind of spare capacity just lying around. So the question of having the source for now in this case is less about being able to do something with it, and more about behind able to examine what's going into it. Aside from making it so it won't tell me how to make cocaine or bombs, what other directives has it been programmed with on top of the intial training run. That's what's important here, so I disagree that is a red herring. Both aspects are important here, but the most important one is to not let Mark Zuckerberg co-opt the term Open Source when it's only model available, and definitely not even actually Open at that.

lxgr 4/16/2025||

It gets even weirder with Llama 4: https://www.llama.com/llama4/use-policy/ [Update: Apparently this has been the case since 3.2!]

> With respect to any multimodal models included in Llama 4, the rights granted under Section 1(a) of the Llama 4 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models.

This is especially strange considering that Llama 3.2 also was multimodal, yet to my knowledge there was no such restriction.

In any case, at least Huggingface seems to be collecting these details now – see for example https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Inst...

Curious to see what Ollama will do.

diggan 4/16/2025||

Technically, those were there since 3.2's Use Policy. I've summarized the changes of the license here: https://notes.victor.earth/how-llamas-licenses-have-evolved-...

lxgr 4/16/2025||

Oh, I must have compared the license of 3.2 to the usage policy of 4 then or made some other error; I was convinced this was a new restriction!

Thank you, also for that article – the tabular summary of changes across the two is great!

geocar 4/16/2025|||

Super weird.

Any idea what standing Meta/Llama thinks they have when they write stuff like that?

Is it copyright law? Do they think llama4 is copyright them? Is it something else?

crimsoneer 4/16/2025||

I'm pretty sure what they think is "we can loudly complain EU over-regulation is stifling innovation and try and get the public on-board by showing them everything they're missing".

jujube3 4/16/2025||

They don't want to be sued in the EU for releasing a model. One way to do that is by not releasing it in Europe.

It's the same reason a lot of websites block the EU rather than risk being sued under the GDPR.

geocar 4/17/2025||

I'm not convinced.

The GDPR can absolutely be applied to foreign companies https://gdpr.eu/companies-outside-of-europe/ but even if it couldn't, Facebook is an Irish company specifically so they can avoid American taxes. I do not think the EU will have any problem with jurisdiction.

jujube3 4/22/2025||

Your own link says that the GDPR applies when companies "offer goods or services" to people in the EU. That's why a lot of websites block the EU.

dheera 4/16/2025||

> Curious to see what Ollama will do.

I don't think they care. I'm pretty sure Llama itself trained on a bunch of copyrighted data. Have licence agreements actually mattered?

cma 4/16/2025|||

As long as you get the model weights without agreeing to the license, there has been no case in the US yet at least where model weights have been ruled to be subject to copyright and it is likely they are a mechanical transform of data. FB probably owns copyright on some of the data used for instruction tuning the model, but they rely on the transform also removing that copyright if it removes the copyright of the other underlying data they don't own the copyright to.

They want to have their cake and eat it though and these companies are all lobbying hard in a political system with open bribery.

lxgr 4/16/2025|||

> As long as you get the model weights without agreeing to the license

But is that how it works? Not implying that the situation is otherwise comparable, but you e.g. can't ignore the GPL that a copyrighted piece of code is made available to you under, just because "you didn't agree to it".

As I see it (as a non-lawyer), either model weights are copyrightable (then Meta's license would likely be enforceable), or they aren't, but then even "agreeing to the license" shouldn't be able to restrict what you can do with them.

In other words, I'd consider it a very strange outcome if Meta could sue licensees for breach of contract for their use of the weights, but not non-licensees, if the license itself only covers something that Meta doesn't even have exclusive rights to in the first place. (Then again, stranger decisions have been made around copyright.)

Borealid 4/16/2025|||

Your parent poster is quite clearly stating that they believe that model weights (ie, the model itself) cannot be copyrighted as it is not a creative work.

If the model weights cannot be copyrighted, you are not violating copyright law by duplicating them in the absence of a license grant, so you gain no benefit in entering into a contract by agreeing to the license.

lxgr 4/16/2025||

That's what I'm wondering about. I believe contracts under US law need "consideration" for both sides, and I'm not sure publicly available, non-copyrightable weights alone would hold up in court, even if the "licensee" explicitly accepts a license agreement as a contract.

rafaelmn 4/16/2025||||

I think OP is arguing that the weights are not copyrightable but you're agreeing to ~ EULA by accepting terms when you acquire it ? Not sure that works either but that's how I understood them.

michaelt 4/16/2025|||

If you get the Llama weights from Meta directly, you need to make an account [1] and hand over your "full legal name, date of birth, and full organization name" (and presumably click 'agree' on a click-through agreement) before they'll let you download them.

You can sidestep this by just downloading someone else's quantisation / fine tune.

[1] https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Inst...

lxgr 4/16/2025|||

Thinking about an absurd example to illustrate my point:

Imagine you find a printed copy of "Moby Dick". On the first page, you see the statement: "By reading this work, you are entering into a legally binding contract, obligating you to not spoil the ending to anybody under penalty of $100, payable as a charitable donation to NYPL."

Can whoever printed the (out of copyright) book sue you if you spoil the ending and don't make the donation? Would things change if you were to sign the first page with your name?

_aavaa_ 4/16/2025||

A physical book, no. A digital book, cynically I think yes since you don’t own a copy you own a license to use it.

alain94040 4/17/2025||

[I'm not a lawyer but spent way too much time with intellectual property lawyers]

Not true either for a digital book. However, since you are in possession of a digital copy of something that is definitely copyrightable, you have two options to defend yourself and not go to jail: either use the license terms that give you the digital rights to the book (and pay the $100), or not have a license, not have a defense, and let the judge decide what the penalty should be. Will it be more than $100?

ratatoskrt 4/16/2025|||

I can agree to the license, download the weights, and send them to you. Now you have the weights without ever agreeing to the license. Code is different because it's copyrighted.

geocar 4/16/2025|||

> FB probably owns copyright on some of the data used for instruction tuning the model,

I'm not so sure they do, and even if they did so what? Holding the copyright on some of the data being used in the model doesn't mean they hold the copyright on the model.

> They want to have their cake and eat it

Nemo auditur propriam turpitudinem allegans.

cma 4/18/2025||

That's what my comment tried to say but flubbed it a little bit

geocar 4/18/2025||

NitpickLawyer 4/17/2025|||

> I'm pretty sure Llama itself trained on a bunch of copyrighted data.

Every good, "SotA" model is trained on copyrighted data. This fact becomes aparent when models are released with everything public (i.e. training data) and they score significantly behind in every benchmark.

tough 4/18/2025||

Research team from orielly found out openai trained on copyirghted books

prob got a sub...

https://ssrc-static.s3.us-east-1.amazonaws.com/OpenAI-Traini...

wrs 4/16/2025||

AFAIK it’s still an open question whether there is any copyright in model weights, for various reasons including the lack of human authorship. Which would mean that if you didn’t separately and explicitly agree to the license by clicking through something, there is no basis for the “by using this model” agreement.

Of course you probably don’t have enough money to get a ruling on this question, just wanted to point out that (afaik) it is up for debate. Maybe you should just avoid clicking on license agreement buttons, if you can.

ronsor 4/16/2025||

I'm in the "model weights aren't copyrightable" camp myself. I think the license exists largely to shield Meta from liability or the actions of third parties.

hackingonempty 4/16/2025|||

Humans make many choices that effect the trained weights: curation and choice of datasets, training schedules and hyperparameters. If these choices are made with an eye towards the generated results, rather than mechanically based upon test scores, why doesn't that rise up to the minimal level required to get a copyright on the weights?

wrs 4/18/2025||

It might. As I say, it’s up for debate. A judge might look at the 1kB of hyperparameters versus the 1TB of training data, and the 10 person-years of human effort versus 100,000 GPU-years of computer effort, and conclude differently.

Does Google have copyright of their search index? Never tested, as far as I know.

hackingonempty 4/22/2025||

"sweat of the brow" was rejected in Feist, the effort involved makes no difference. The minimum creativity bar for obtaining copyright is very low but above an alphabetical ordered listing of all of your customers.

oceanplexian 4/16/2025|||

> AFAIK it’s still an open question whether there is any copyright in model weights

There's definitely copyright when you ask the model to spit out Chapter 3 of a Harry Potter book and it literally gives it to you verbatim (Which I've gotten it to do with the right prompts). There's no world where the legal system gives Meta a right to license out content that never belonged to them in the first place.

jcranmer 4/16/2025|||

In the US, it's not an open question. Feist v Rural holds that any work needs to possess a minimum spark of (human) creativity to be able to be copyrighted; data collected by the "sweat of the brow" is explicitly not allowed to be copyrighted. Thus things like maps and telephone books aren't really copyrightable (they do retain a "thin copyright" protection, but in the present context, you're going to say that the code has copyright but the model weights do not). Most European jurisdictions do recognize a "sweat of the brow" doctrine, and they could be copyrightable there.

What's not clear is whether or not the model weights infringe the copyright of the works they were trained on.

grumpymuppet 4/16/2025|||

This seems like a reasonable position to take. Can you copyright the contents of a vacuum bag after pouring it down a gallon board as "art"?

Did you have any meaningful hand in constructing the contents?

jfarina 4/16/2025|||

Seriously. Can I copyright 34 * 712 * 9.2 * pi? I didn't think you could.

markisus 4/17/2025||

I wonder what Meta’s attitude towards copyright in general is. Last I heard their lawyers were trying to say that it’s all good to pirate copyrighted works to make machine learning models.

mewse-hn 4/16/2025||

A lot of posts here saying this is irrelevant or whatever - unix was mostly developed out in the open and got into all sorts of trouble when big companies started cracking down on copyright ownership. This blog post isn't entreating you to necessarily behave differently w/r/t your Llama usage, just to be aware the license is restrictive, doesn't really line up with Meta calling it "open source", and eventually this could create consequences down the road. The post doesn't even mention it but there are safer models to use right now (deepseek) that have permissive licensing.

Groxx 4/16/2025||

@dang the title rename seems worse: "you're probably ..." is rather meaningfully different than "breaking the ..." as the latter sounds like it's instructions on how to "break the ..."

mkl 4/13/2025||

This was posted by the author a couple of weeks ago, but didn't get any traction: https://news.ycombinator.com/item?id=43504429

NoahZuniga 4/16/2025||

The post states:

> One example where this requirement wasn't violated, is on build.nvidia.com

But built with llama isn't shown prominently, so this is actually an example of a violation of the license.

thot_experiment 4/16/2025||

That's just like, your opinion man. This entire discussion and blog post are purely a fun distraction, legal contracts don't work how programmers think they work. The only definition of "prominently" that matters is the one the judge rules on when Zuck sues you.

dangus 4/16/2025||

Meta is free to license Llama from Meta under a different license are they not?

NoahZuniga 4/17/2025||

Yes, but the post gives that as an example of what follows the license. So even if it's not illegal because nvidia has a different license, it doesn't follow as a good example.

dangus 4/18/2025||

Yeah, basically what I'm saying is that we can't even guarantee that it's an attempt at compliance with this specific license because it's a major corporation that may want to use the software under a different license negotiated privately.

cmacleod4 4/16/2025||

A strange assertion considering I'm not using this "Llama" thing :-/

janalsncm 4/17/2025||

If a rogue laptop violates the Llama license and no one is around to enforce it, did it really break the Llama license?

Seriously, I genuinely wonder what the purpose of adding random unenforceable licenses to code/binaries. Meta knows people don’t read license agreements, so if they’re not interested in enforcing a weird naming convention, why stipulate it at all?

mertleee 4/17/2025|

The guy who wrote this article is likely the only person to self-report to the police that he might have broken a dumb rule published by a huge company that manipulates and deceives millions of people every minute... haha

antirez 4/16/2025|

With Gemma3, Qwen, the new Mistral and DeepSeek, all models stronger than llama of the same size, why one would risk issues?

diggan 4/16/2025||

> Gemma3

Funnily enough, Gemma 3 also probably isn't "open source" if you have a previous understanding of what that is and means, they have their own "Gemma Terms of Use" which acts as the license. See https://ollama.com/library/gemma2/blobs/097a36493f71 for example.

antirez 4/16/2025||

True, but it's a lot more "soft". They basically say do what you want if you don't violate this list of prohibited uses, which is not open source of course, but the list of prohibited users are, mostly, things already against the law or in a very gray zone.

ai-christianson 4/16/2025||

Deepseek are the kings currently.

More comments...