Top
Best
New

Posted by m463 4 days ago

FSF statement on copyright infringement lawsuit Bartz v. Anthropic(www.fsf.org)
211 points | 104 commentspage 2
psychoslave 11 hours ago|
How dare they? Defending freedom of these filthy people and dignity of authors against these nice familiar corporations!

The rephrased¹ title "FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Free" certainly doesn’t dramatise enough how odious an act it can be.

¹ Original title is "The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"

rvz 13 hours ago||
> Among the works we hold copyrights over is Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software, which was found in datasets used by Anthropic as training inputs for their LLMs.

This is the reason why AI companies won't let anyone inspect which content was in the training set. It turns out the suspicions from many copyright holders (including the FSF) was true (of course).

Anthropic and others will never admit it, hence why they wanted to settle and not risk going to trial. AI boosters obviously will continue to gaslight copyright holders to believe nonsense like: "It only scraped the links, so AI didn't directly train on your content!", or "AI can't see like humans, it only see numbers, binary or digits" or "AI didn't reproduce exactly 100% of the content just like humans do when tracing from memory!".

They will not share the data-set used to train Claude, even if it was trained on AGPLv3 code.

impossiblefork 10 hours ago||
There's already legal requirements in the EU that you must publish what goes into your training set. This information must apparently be publshed before the august 2 next year.
ronsor 3 hours ago||
Guess the solution is to not do it and simply pay fines (or not pay fines, if you don't have any EU operations).
impossiblefork 30 minutes ago||
Yes, unfortunately. I don't really understand this obsession with regulations that involve fines. One would think that people would have the courage to make laws that either ban things or don't.

I think the fines will effectively be mandatory though, even with no obvious EU operations.

zelphirkalt 12 hours ago||
They simply have way too much incentive to train on anything they can get their hands on. They are driving businesses, that are billions in losses so far. Someone somewhere is probably being told to feed the monster anything they can get, and not to document it, threatened with an NDA and personal financial ruin, if the proof of it ever came out. Opaque processes acting as a shield, like they do in so many other businesses.
slopinthebag 13 hours ago||
Good. I want to see more lawsuits going after these hyper scalers for blatantly disregarding copyright law while simultaneously benefiting from it. In a just world they would all go down and we would be left with just the OSS models. But we don't live in a fair world :(
mjg59 15 hours ago||
Where's the threat? The FSF was notified that as part of the settlement in Bartz v. Anthropic they were potentially entitled to money, but in this case the works in question were released under a license that allowed free duplication and distribution so no harm was caused. There's then a note that if the FSF had been involved in such a suit they'd insist on any settlement requiring that the trained model be released under a free license. But they weren't, and they're not.

(Edit: In the event of it being changed to match the actual article title, the current subject line for this thread is " FSF Threatens Anthropic over Infringed Copyright: Share Your LLMs Freel")

teiferer 14 hours ago||
> but in this case the works in question were released under a license that allowed free duplication and distribution so no harm was caused.

FSF licenses contain attribution and copyleft clauses. It's "do whatever you want with it provided that you X, Y and Z". Just taking the first part without the second part is a breach of the license.

It's like renting a car without paying and then claiming "well you said I can drive around with it for the rest of the day, so where is the harm?" while conveniently ignoring the payment clause.

You maybe confusing this with a "public domain" license.

mjg59 13 hours ago|||
If what you do with a copyrighted work is covered by fair use it doesn't matter what the license says - you can do it anyway. The GFDL imposes restrictions on distribution, not copying, so merely downloading a copy imposes no obligation on you and so isn't a copyright infringement either.

I used to be on the FSF board of directors. I have provided legal testimony regarding copyleft licenses. I am excruciatingly aware of the difference between a copyleft license and the public domain.

danlitt 12 hours ago|||
> I am excruciatingly aware of the difference between a copyleft license and the public domain.

Then why did you say "no harm was caused"? Clearly the harm of "using our copylefted work to create proprietary software" was caused. Do you just mean economic harm? If so, I think that's where the parent comments confusion originates.

mjg59 5 hours ago||
No harm under copyright law
friendzis 11 hours ago||||
> The GFDL imposes restrictions on distribution, not copying, so merely downloading a copy imposes no obligation on you and so isn't a copyright infringement either.

The restrictions fall not only on verbatim distribution, but derivative works too. I am not aware whether model outputs are settled to be or not to be (hehe) derivative works in a court of law, but that question is at the vey least very much valid.

mcherm 10 hours ago|||
It's the third sentence of the article:

> the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal.

friendzis 9 hours ago||
No, those are separate issues.

The pipeline is something like: download material -> store material -> train models on material -> store models trained on material -> serve output generated from models.

These questions focus on the inputs to the model training, the question I have raised focuses on the outputs of the model. If [certain] outputs are considered derivative works of input material, then we have a cascade of questions which parts of the pipeline are covered by the license requirements. Even if any of the upstream parts of this simplified pipeline are considered legal, it does not imply that that the rest of the pipeline is compliant.

superxpro12 8 hours ago||
Consider the net effect and the answer is clear. When these models are properly "trained", are people going to look for the book or a derivative of it, with proper attribution?

Or is the LLM going to regurgitate the same content with zero attribution, and shift all the traffic away from the original work?

When viewed in this frame, it is obvious that the work is derivative and then some.

limagnolia 6 hours ago||
That is your opinion, but the judge disagreed with you. The decision may have been overturned on appeal, but as it stands, in that courtroom, the training was fair use.
seba_dos1 6 hours ago|||
I can memorize a song and it will be fair use too, but it won't be anymore once I start performing it publicly. Training itself is quite obviously fair use, what matters is what happens next.
integralid 6 hours ago|||
This is also, unfortunately, the only way this can be settled. Making LLM output legally a derivative work would murder the AI golden rush and nobody wants that
protimewaster 3 hours ago|||
I'm also skeptical that it's impossible to get an LLM to reproduce some code verbatim. Google had that paper a while back about getting diffusion models to spit out images that were essentially raw training data, and I wouldn't be surprised if the same is possible for LLMs.
snovv_crash 12 hours ago||||
Models, however, can reproduce copyleft code verbatim, and are being redistributed. Doesn't that count?

Licences like AGPL also don't have redistribution as their only restriction.

shagie 7 hours ago||
Stack Overflow has verbatim copied GPL code in some of its questions and answers. As presented by SO, that code is not under the GPL license (this also applies to other licenses - the BSD advertising clause and the original json will cause similar problems).

Arguably, the use of the code in the Stack Overflow question and answer is fair use.

The problem occurs not when someone reads the Q&A with the improperly licensed code but rather when they then copy that code verbatim into their own non GPL product and distribute that without adherence to the GPL.

It's the last step - some human distributing the improperly licensed software that is the violation of the GPL.

This same chain of what is allowed and what is not is equally applicable to LLMs. Providing examples from GPL licensed material to answer a question isn't a license violation. The human copying that code (from any source) and pasting it into their own software is a license violation.

---

Some while back I had a discussion with a Swiss developer about the indefinite article used before "hobbit" in a text game. They used "an hobbit" and in the discussion of fixing it, I quoted the first line of The Hobbit. "In a hole in the ground there lived a hobbit." That cleared it up and my use of it in that (and this) discussion is fair use.

If someone listening to that conversation (or reading this one) thought that the bit that I quoted would be great on a T-shirt and them printed that up and distributed it - that would be a copyright violation.

Google's use of thumbnails for images was found to be fair use. https://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Amazon.com...

    The Ninth Circuit did, however, overturn the district court's decision that Google's thumbnail images were unauthorized and infringing copies of Perfect 10's original images. Google claimed that these images constituted fair use, and the circuit court agreed. This was because they were "highly transformative."
If I was to then take those thumbnails from a google image search and distribute that as an icon library, I would then be guilty of copyright infringement.

I believe that Stack Overflow, Google Images, and LLM models and their output constitutes an example of transformative fair use. What someone does with that output is where copyright infringement happens.

My claim isn't that AI vendors are blameless but rather that in the issue of copyright and license adherence it is the human in the process that is the one who has agency and needs to follow copyright (and for AI agents that were unleashed without oversight, it is the human that spun them up or unleashed them).

piker 11 hours ago||||
That's really interesting. I'm a lawyer, and I had always interpreted the license like a ToS between the developers. That (in my mind) meant that the license could impose arbitrary limitations above the default common law and statutory rules and that once you touched the code you were pregnant with those limitations, but this does make sense. TIL. So, thanks.
ronsor 1 hour ago|||
Licenses != contracts, and well, the FSF's position has always been that the GPL isn't a contract, and contracts are what allow you to impose arbitrary limitations. Most EULAs are actually contracts.
graemep 9 hours ago|||
Does the reasoning in the cases where people to whom GPL software was distributed could sue the distributor for source code, rather than relying on the copyright holder suing for breach of copyright strengthen the argument that arbitrary limitations are enforceable?
dataflow 6 hours ago||||
Unrelated question regarding this part, since you seem to be an expert on this:

> If what you do with a copyrighted work is covered by fair use it doesn't matter what the license says - you can do it anyway.

How is it that contracts can prohibit trial by jury but they can't ban prohibit fair use of copyrighted work? Is there a list of things a contract is and isn't allows to prohibit, and explanations/reasons for them?

AnthonyMouse 5 hours ago||
The general answer is because there is a statute or court opinion that says so for one thing and a different one that says something else for the other thing.

It's also relevant that copyright (and fair use) is federal law, contracts are state law and federal law preempts state law.

materialpoint 11 hours ago||||
This means that you can ignore any part of licenses you don't want to and just copy any software you want, non-free software included.
mjg59 5 hours ago|||
No. The GFDL grants you permission to copy the work.
mikkupikku 10 hours ago|||
This is in fact how I operate.
thayne 7 hours ago||||
But fair use is dependent on you getting the work legally. Is downloading a book with the intention of violating the GFDL a legal way of acquiring it.
jcul 14 hours ago||||
This article is talking about a book though, not software.

"Sam Williams and Richard Stallman's Free as in freedom: Richard Stallman's crusade for free software"

"GNU Free Documentation License (GNU FDL). This is a free license allowing use of the work for any purpose without payment."

I'm not familiar with this license or how it compares to their software licenses, but it sounds closer to a public domain license.

kennywinker 13 hours ago|||
It sounds that way a bit from the one sentence. But that’s not the case at all.

> 4. MODIFICATIONS

> You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

Etc etc.

In short, it is a copyleft license. You must also license derivative works under this license.

Just fyi, the gnu fdl is (unsurprisingly) available for free online - so if you want to know what it says, you can read it!

mjg59 13 hours ago|||
And the judgement said that the training was fair use, but that the duplication might be an infringement. The GFDL doesn't restrict duplication, only distribution, so if training on GFDLed material is fair use and not the creation of a derivative work then there's no damage.
kennywinker 3 hours ago|||
> The GFDL doesn't restrict duplication

Right. I can publish the work in whole without asking permission. That’s unrestricted duplication.

However, as i read it, an LLM spitting out snippets from the text is not “duplicating” the work. That would fall under modifications. From the license:

> A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

I read that pretty clearly as any work containing text from a gnu fdl document is a modification not a duplication.

mjg59 2 hours ago||
There's three steps here:

1) Obtaining the copyrighted works used for training. Anthropic did this without asking for the copyright holders' permission, which would be a copyright violation for any work that isn't under a license that grants permission to duplicate. The GFDL does, so no issue here. 2) Training the model. The case held that this was fair use, so no issue here. 3) Whether the output is a derivative work. If so then you get to figure out how the GFDL applies to the output, but to the best of my knowledge the case didn't ask this question so we don't know.

leni536 11 hours ago|||
Last time I checked online LLMs distribute parts of their training corpus when you prompt them.
onion2k 13 hours ago|||
For this to stand up in court you'd need to show that an LLM is distributing "a modified version of the document".

If I took a book and cut it up into individual words (or partial words even), and then used some of the words with words from every other book to write a new book, it'd be hard to argue that I'm really "distributing the first book", even if the subject of my book is the same as the first one.

This really just highlights how the law is a long way behind what's achievable with modern computing power.

kennywinker 2 hours ago|||
You’re just describing transformative use. I’m not a lawyer, but an example from music - taking a single drum hit from a james brown song is apparently not transformative. Taking a vibe from another song is also maybe not transformative, e.g. robin thicke and pharrell’s “blurred lines” was found to legally take the “feel” from Marvin Gaye’s “Got to Give it Up”

Which is all to say that the law is actually really bad at determining what is right and wrong, and our moral compasses should not defer to the law. Unfortunately, moral compasses are often skewed by money - like how normal compassess are skewed by magnets

ndsipa_pomu 12 hours ago||||
Presumably, a suitable prompt could get the LLM to produce whole sections of the book which would demonstrate that the LLM contains a modified version.
p_l 12 hours ago||
Yes, and for practical purposes the current consensus (and in case of EU, the law) is that only said document would be converted by FDL
kennywinker 2 hours ago||
I am distrubting an svg file. It’s a program that, when run, produces an image of mickey mouse.

By your description of the law, this svg file is not infringing on disney’s copyright - since it’s a program that when run creates an infringing document (the rasterized pixels of mickey mouse) but it is not an infringing document itself.

I really don’t think my “i wrote a program in the svg language” defense would hold up in court. But i wonder how many levels of abstraction before it’s legal? Like if i write the mickey-mouse-generator in python does that make it legal? If it generates a variety of randomized images of mickey mouse, is that legal? If it uses statistical anaylsis of many drawings of mickey to generate an average mickey mouse, is that legal? Does it have to generate different characters if asked before it is legal? Can that be an if statement or does it have to use statistical calculations to decide what character i want?

rcdwealth 9 hours ago|||
[dead]
karel-3d 13 hours ago|||
FDL is famously annoying.

wikipedia used to be under FDL and they lobbied FSF to allow an escape hatch to Commons for a few months, because FDL was so annoying.

ghighi7878 11 hours ago||||
Telling mjg59 they are confused about a license is an audacious move. But I understand your question and I have the same question.
Dylan16807 13 hours ago|||
They don't need the "do whatever" permission if everything they do is fair use. They only need the downloading permission, and it's free to download.
darkwater 12 hours ago|||
I don't like the editorialized title either but I would say that the actual post title

"The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom"

and this sentence at the end

" We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation."

could be seen as "threatening".

lelanthran 14 hours ago|||
It's just an indication to model trainers that they should take care to omit FSF software from training.

Not a nothing burger, but not totally insignificant either.

mjg59 14 hours ago||
Is it? The FSF's description of the judgement is that the training was fair use, but that the actual downloading of the material may have been a copyright infringement. What software does the FSF hold copyright to that can't be downloaded freely? Under what circumstances would the FSF be in a position to influence the nature of a settlement if they weren't harmed?
jfoster 13 hours ago||
Is harm necessary to show in a copyright infringement case?
mjg59 13 hours ago||
Copyright infringement causes harm, so if there's no harm there's no infringement. You can freely duplicate GFDLed material, so downloading it isn't an infringement. If training a model on that downloaded material is fair use then there's no infringement.
eschaton 14 hours ago||
[flagged]
mjg59 14 hours ago|||
If it's pretty fucking simple, can you point to the statement in the linked post that supports this assertion? What it says is "According to the notice, the district court ruled that using the books to train LLMs was fair use", and while I accept that this doesn't mean the same would be true for software, I don't see anything in the FSF's post that contradicts the idea that training on GPLed software would also be fair use. I'm not passing a value judgement here, I'm a former board member of the FSF and I strongly believe in the value and effectiveness of copyleft licenses, I'm just asking how you get from what's in the post to such an absolute assertion.
boramalper 13 hours ago||
Yet another instance of people jumping to comments based on the title of the submission alone. They don't mention GPL even once in that post...
sunnyps 14 hours ago|||
It's pretty fucking simple: a judge needs to decide that, not armchair lawyers on HN.
Bombthecat 14 hours ago||
We know AI will be pushed through. No matter the laws
agile-gift0262 13 hours ago||
what I keep wondering is what kind of laws will be rendered useless with the precedent they'll cause. Can this be beginning of the end of copyright and intellectual property?
tsimionescu 13 hours ago|||
Copyright, possibly. Intellectual property more broadly, no. AI has 0 impact on trademark law, quite clearly (which is anchored in consumer protection, in principle). Patent law is perhaps more related, but it's still pretty far.
volkercraig 7 hours ago||||
Why are you wondering? Any law that limits the ability of capital owners to extract wealth will be overturned, and not just from AI, that's global in every industry everywhere there are humans.
Bombthecat 13 hours ago||||
In a way, I think so. Just let the code recreate existing code, say it's AI code and doesn't break any copyright laws
duskdozer 12 hours ago|||
Doubt it. I'm sure it will have an exclusion where for example using genAI to train on or replicate leaked or reverse-engineered Windows code will constitute copyright infringement, but doing the same for copyleft will be allowed. Always in favor of corporate interests.
grodriguez100 13 hours ago||
Is the FSF threatening Anthropic? The way I read it looks like they are not:

> We are a small organization with limited resources and we have to pick our battles, but if the FSF were to participate in a lawsuit such as Bartz v. Anthropic and find our copyright and license violated, we would certainly request user freedom as compensation.

Sounds more like “we can’t and won’t sue, but this is the kind of compensation that we think would be appropriate”

raincole 12 hours ago||
HN really needs some stricter rules for editorialized title. The HN title has nothing to do with the link (unless the article is edited?)
latexr 12 hours ago||
The rule is fine and clear, it just wasn’t followed here. There’s no reason to have a stricter rule, what you’re complaining about is its enforcement. Two moderators can’t read everything, if you have a complaint, email them (contact link at the bottom of the page), they are quite responsive.
touristtam 12 hours ago||
flag the submission?
politelemon 14 hours ago||
The title is:

The FSF doesn't usually sue for copyright infringement, but when we do, we settle for freedom

lokimoon 9 hours ago||
[dead]
khalic 12 hours ago||
Misleading title
Kwpolska 14 hours ago|
[flagged]
tomhow 13 hours ago||
Please don't fulminate on HN. The guidelines make it clear we're trying for something better here. https://news.ycombinator.com/newsguidelines.html
mcv 14 hours ago|||
It's not the FSF's job to provide Anthropic with a business model. If it turns out that their business model depends entirely on copyright violation, they might not have a business model. That's true regardless of whether you think the case has any merit.
solid_fuel 13 hours ago|||
> Classic Hollywood, completely detached from reality. Did they propose any way for The Pirate Bay to continue earning any revenue, paying for its hosting, and developing new features, if they aren't allowed to redistribute movies for free?
kouteiheika 13 hours ago|||
Although it might not satisfy FSF there is a very simple way to do it - commit to release your models for free X months after they're first made available.
Shitty-kitty 13 hours ago|||
Should the FCC be proposing how Robocaller's continue earning any revenue in its decisions?
leni536 11 hours ago|||
https://smbc-comics.com/index.php?db=comics&id=1060#comic
slopinthebag 13 hours ago|||
What the fuck? How is that their problem?

"Yeah we can't prosecute this person for stealing your car, because you haven't considered how they're going to get to work"

pwdisswordfishy 13 hours ago||
OH NO THE POOR CAPITALISTS