Top
Best
New

Posted by senaevren 1 day ago

Who owns the code Claude Code wrote?(legallayer.substack.com)
441 points | 405 commentspage 3
e12e 1 day ago|
Seems to gloss over other kinds of contamination, beyond GPL code. Code from pirated text books, the problem with the entire language model being trained on copyright data, and on the possibility of the training data containing various copyrighted code.
embedding-shape 1 day ago|
> Code from pirated text books

Anthropic "solved" this by intermingling the texts extracted from pirated books (illegal) with texts extracted from the physical books they bought and destroyed (legal), so no one can clearly say if the copyrighted material it spits out came from a legal source or not. Everyone rejoiced.

thyrsus 15 hours ago|||
I've seen copyright notices that explicitly forbid use for AI training. Would this "transformation" argument still hold in such cases?

For example:

No Generative AI Training Use

For avoidance of doubt, Author reserves the rights, and grants no rights to, reproduce and/or otherwise use the Work in any manner for purposes of training artificial intelligence or machine learning technologies to generate text, text to speech, voice, or audio including without limitation, technologies that are capable of generating works in the same style or genre as the Work, unless individual or entity obtains Author’s specific and express permission to do so. Nor does any individual or entity have the right to sublicense others to reproduce and/or otherwise use the Work in any manner for the purposes of training artificial intelligence or machine learning technologies to generate text, text to speech, voice, or audio without Author’s specific and express permission.

e12e 22 hours ago||||
> books they bought and destroyed (legal)

They're only legal if training is fair use - and even I don't think it's immediately clear what would be the legal status of verbatim regurgitation of code in copyright, or code protected by patents?

AFAIK I (as a human developer) can't assume that I can go and copy code out of a text book, and then assume copyright and charge for a license to it?

embedding-shape 21 hours ago||
> They're only legal if training is fair use

The judge seems to have said it's because they "transformed" the books (destroying them after digitalizing) in the process, that made it legal.

> Ultimately, Judge William Alsup ruled that this destructive scanning operation qualified as fair use—but only because Anthropic had legally purchased the books first, destroyed each print copy after scanning, and kept the digital files internally rather than distributing them. The judge compared the process to “conserv[ing] space” through format conversion and found it transformative. - https://arstechnica.com/ai/2025/06/anthropic-destroyed-milli...

e12e 19 hours ago||
Interesting - so local models, like Google Gemini is then likely pirated by this interpretation - because the model is distributed? Ditto open weight models?
senaevren 1 day ago|||
The intermingling argument is actually central to the Bartz settlement structure. The settlement required destruction of the pirated dataset specifically because commingled training data creates an unresolvable provenance problem. For deployers building on Claude, EDPB Opinion 28/2024 requires a documented assessment of the foundation model's training data legal basis before deployment. "We cannot tell which outputs came from which source" is not a satisfactory answer to a regulator running that assessment. wrote about it before here: https://legallayer.substack.com/p/i-read-every-edpb-document...
hackingonempty 1 day ago||
Nobody disputes that I own the copyright in a sound recording I made just by pushing the red button on my recorder. So it is a mystery to me that copyright to any sort of human conditioned machine generation is in dispute.
senaevren 1 day ago|
The sound recording analogy breaks down at the point where the recorder makes no creative decisions. Pressing record captures what is already there. Prompting Claude generates something that did not exist, through decisions the model makes about structure, naming, pattern, and implementation. The closer analogy is hiring a session musician and telling them the key and tempo. You own the recording under work-for-hire if they signed the right contract, but the creative expression in the performance is theirs unless explicitly assigned. The button you push to start the model is not the same button as the one on the recorder.
hackingonempty 18 hours ago|||
> Prompting Claude generates something that did not exist, through decisions the model makes about structure, naming, pattern, and implementation.

LLMs don't make decisions. Their output is completely determined by an algorithm using the human prompt, fixed weights, and a random seed. No different than the many effects humans use in image or audio editors. Nobody ever questioned whether art made using only those effects on a blank canvas was subject to copyright.

CamperBob2 20 hours ago|||
Fourier theory says that any sound, however complex, can be synthesized by summing sines and cosines. That's what an LLM does, if you twist the metaphor enough. It synthesizes complex outputs from simpler basis functions that are, or should be, uncopyrightable.

The fact that it inferred those basis functions from studying copyrighted works doesn't seem relevant. Nor does the fact that the "Fourier sums" sometimes coincide with larger fragments of works that are copyrighted. How weird would it be if that didn't happen?

array_key_first 15 hours ago||
Of course it's relevant. How copyright infringement happens doesn't actually matter, all that matters is that the infringement happened.

If I painstakingly recreate A New Hope frame by frame, pixel by pixel, that's infringement. Even if I technically used 0 content from the original.

CamperBob2 14 hours ago||
Nobody is doing that, though. You might get a watermarked screenshot or stock photo now and then, or a couple of mostly-verbatim paragraphs from Harry Potter.

In any case, if the copyright mafia insists on butting heads with AI, they'll find that the fight doesn't quite play out the way it has in the past.

metalcrow 21 hours ago||
"if Claude was trained on the LGPL-licensed codebase and its output reflects patterns learned from that code, can the output be treated as license-free? The emerging legal consensus is probably not, and assuming it can creates significant liability for anyone shipping that code commercially."

Is there any citation for this "legal consensus"? I was not aware there was any evidence backed stances on this topic as of yet

onlyrealcuzzo 21 hours ago||
This sounds like a problem that's pretty easy to get around.

CC does not need LGPL code. There's more than enough BSD and Apache code to go around.

And they can generate synthetic data that is better than LGPL for their training.

It's also a problem that does not seem feasible to meaningfully enforce.

It's easy to generate CC code and lie and say you didn't. It would be hard to prove that you did, especially if you took any precautions to make it even slightly difficult that you did.

adrian_b 20 hours ago||
Unlike GPL, BSD and Apache licenses do not claim to also cover your non-AI-generated code that only invokes the AI-generated code.

However, even if the BSD/Apache/MIT licensed code can be incorporated freely in your application, you still have no right to remove the copyright notices from it and/or to claim that you own the copyright for it.

Therefore, unless the AI model has been trained only on non-copyrighted public-domain code, incorporating the generated code in your application means that you have removed the copyright notices from it, which is not allowed by the original licenses.

There is absolutely no doubt that using an AI coding assistant works around the copyright laws, but it is still equivalent with doing copy and paste with fragments from copyrighted works into your source code.

I consider that copyright should not be applicable to program sources, at least not in its current form, so reusing parts from other programs should be fair use, but only if human programmers would be allowed to do the same.

onlyrealcuzzo 14 hours ago||
> However, even if the BSD/Apache/MIT licensed code can be incorporated freely in your application, you still have no right to remove the copyright notices from it and/or to claim that you own the copyright for it.

I can't speak for all licenses, but I'm familiar with at least one BSD license. That's almost the entire point of it...

You cannot take their literal code and call it your own. You can derive code from it and call it your own. That's what LLMs primarily do.

NoMoreNicksLeft 18 hours ago|||
With sufficient obfuscation (which models seem to provide intrinsically), how would anyone know to sue? On top of that, only the most major sorts of litigation have the legal force to pierce even the flimsiest of obfuscation... this is likely all moot.

If some GPL-licensed group were to sue some commercial software project that they do not have the source code for, what would even give it away? But they throw $1 million at a lawyer who can at least get it to the discovery phase somehow, and the source code is provided. It looks to be shit, but maybe an expert witness would come along and say "that looks inspired by the open source project". Where does it go from there? The model is a black box, but maybe you've got a superhero lawyer who manages to rope in Anthropic or OpenAI, and you can see how it produced the code given those prompts. What now? Are there any expert witnesses who both could say and would say that it was "bulk copying-pasting code". And if it were, what jury is going to go for that theory of the crime? Copying-and-pasting, but the code doesn't match, except in short little strings that any code might match. This isn't a slamdunk, and it's not going to proceed very far unless it's another Google-vs-Oracle shitfest.

senaevren 19 hours ago|||
The chardet dispute is the closest thing to an active test case on this specific question, and you are right that it has not resolved into settled law. "Emerging legal consensus" was imprecise. The more accurate framing is: the legal community's working assumption, based on how copyright doctrine treats derivative works, is that training-data provenance travels with the output. That assumption has not been tested definitively in court yet.
senaevren 19 hours ago||
thanks for this; it's definitely a fair point. I updated the piece to reflect this
heikkilevanto 18 hours ago||
Ownership is one question. IMO, a more interesting question is who is responsible when the code does some real-life damage.
mock-possum 18 hours ago||
Why should it be any different than it ever was? If a release manager checked it but didn’t catch the vulnerability, they have some culpability. If the developer shipped the code without checking it, they have some culpability too. Ultimately, if they both work under an organization that they report to, they’re responsible to that organization, which is, in turn, accountable to its customers (and investors perhaps.)

LLMs really change nothing about this.

ACCount37 18 hours ago||
No one. The usual.
raggi 7 hours ago||
Lawyers I have spoken to have stated strongly that they believe collective works doctrine will provide strong protections for most mature and sizable software. I see no mention of these considerations here.
TheFirstNubian 1 day ago||
The elephant in the room, of course, is what constitutes “meaningful human authorship.” However, I cannot shake off the feeling that all user interactions with these AI models are being logged. Perhaps this may turn out to be the bigger concern in a potential legal battle than code authorship.
senaevren 1 day ago|
The meaningful human authorship question is the elephant, agreed, and the regulators have deliberately refused to quantify it for exactly the reason you describe any bright line number becomes a target to game rather than a standard to meet.

The logging point is sharper than it might appear. In a copyright dispute over AI-assisted code, interaction logs could cut both ways. A plaintiff trying to establish human authorship would want the logs to show substantial architectural redirection, multiple rejections of Claude output, and documented reasoning for structural decisions. A defendant challenging that authorship claim would subpoena the same logs to show verbatim acceptance of output without modification.

The practical implication i guess here,that the developers who want to preserve a copyright claim over AI-assisted code should treat their prompt history as a legal document from the start. It seems all over the world the logs are the evidence. Whether they help or hurt depends entirely on what they show.

TheFirstNubian 18 hours ago||
The bit about treating one’s prompt history as a legal document has really struck a nerve with me. I’ve been keeping a separate git history solely for my prompts. Initially, the goals were simple: reuse prompts, turn some into skills, etc. But in light of the insights from the article and the discussions here, I need to treat this practice as serious business.
zuzululu 20 hours ago||
I think it's pretty clear cut, whoever is paying for your agentic coding tool subscription is part of the litmus test.

I use my own computer, I pay for my own subscription and I build my open source projects then the code belongs to me.

If I use my company's computer, they pay for my subscription and we work on the company's projects then the code belongs to the company.

In any step of the way if some copy-left or any other form of exotic open source license is violated, who pays for discovery? Is it someone in Russia who created a popular OSS library that is now owed? How will it be enforced?

kazinator 13 hours ago||
> Code that Claude Code or Cursor generated and you accepted without meaningful modification may not be copyrightable by anyone.

Except if it happens to regurgitate a significant excerpt of some existing work, then the authors of that can assert their copyright; i.e. claim that it infringes.

randyrand 17 hours ago||
Normally this solved with an employment contract: "Anything you write, the copyright is transferred to your employer"
joshka 1 day ago|
If you want to go much deeper, https://www.copyright.gov/ai/ is particularly good at least on the side of comprehensiveness.
More comments...