Top
Best
New

Posted by senaevren 1 day ago

Who owns the code Claude Code wrote?(legallayer.substack.com)
429 points | 393 comments
semiquaver 16 hours ago|

  > The US Copyright Office confirmed this in January 2025, and the Supreme Court declined to disturb it in March 2026 when it turned away the Thaler appeal. Works predominantly generated by AI without meaningful human authorship are not eligible for copyright protection, and that rule is now settled at the highest judicial level available.
Misstates the law. Denial of certiorari can happen for many reasons unrelated to the merits and does not settle the issue nationwide.
PaulDavisThe1st 10 hours ago||
From TFA:

> When the Supreme Court declined to hear the Thaler appeal in March 2026, it did not endorse the lower court's reasoning or settle the question nationally. Cert denial means the Court chose not to hear the case, nothing more. What it does mean is that the DC Circuit's ruling stands, the Copyright Office's position is intact, and no court has yet gone the other way.

Your quoted text is no longer in TFA.

jibal 7 hours ago|||
Because the author acted on that comment.
semiquaver 8 hours ago|||
c.f. OP’s comments in this thread.
21asdffdsa12 1 hour ago|||
Lets hire humans as pAIrrots? They see it, they rearrange it, they rename variables and then they "authored" it. What a job- to start for as junior, but if you understand whats happening, you may augment the AIs code by giving "feedback" with enough time.
consp 22 minutes ago|||
Ah the infamous "no I wrote it myself" submission in university coursework. Usually gets you a free visit to the guidance counsel and a bonus free mark (on your three strikes and you are out plagiarism form).
streetfighter64 1 hour ago|||
Free water but not electricity? I'll just hook up a generator to the shower...

These sorts of simplistic loopholes rarely work. Imagine if you could get copyright for the linux kernel by just rearranging it and renaming a few variables.

sylware 4 minutes ago||
I wonder how much of linux and *BSD is in the windows kernel.
greensoap 16 hours ago|||
Also, I don't think there is any example testing the conclusion. There is no case to point at that any of the factors they listed are sufficient to convey authorship. Would love to be pointed to a case where rejecting decisions and redirecting to a different approach was deemed human authorship. What we do know is that you can disclaim the part of the code a human didn't author. In fact, the Copyright Office requires you disclose and disclaim. If anyone out there has more factual and citable sources please share.
KallDrexx 11 hours ago|||
It's in fact the opposite from what I've read. In one of the supreme court cases cited by the copyright office itself in its opinion of AI works (https://en.wikipedia.org/wiki/Community_for_Creative_Non-Vio...) it is deemed that just you advising something to do the work for you, giving criticisms and revisions, isn't enough for authorship or co-authorship.

While it's not code related, the copyright office's opinion is a good read and I don't see any reason to believe it's opinion is different for works of text vs works of physical art: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

senaevren 15 hours ago|||
You are right that no court has yet ruled that a specific set of human contributions to AI-assisted work was sufficient to establish authorship. What exists is the inverse: the Copyright Office has granted partial registrations where human-authored elements were separated from AI-generated elements, as in Zarya of the Dawn, where the human-written text was protected but the Midjourney images were not. The Allen v. Perlmutter case pending in Colorado is the first direct judicial test of whether iterative prompting and editing can constitute authorship. Until that decision, the positive threshold is genuinely unknown. The piece reflects this in the calibration section at the end, though your point is worth adding to the authorship discussion more explicitly.
matheusmoreira 12 hours ago|||
> meaningful human authorship

How is this defined? Is my code review "meaningful" ? Are my amendments and edits to the generated code "human authorship" ?

cooper_ganglia 11 hours ago|||
From the article:

> Specifying an objective to the model is not enough. Directing how the work is constructed is what counts.

TurdF3rguson 7 hours ago||
That's interesting but how is anyone supposed to prove it? They would have to get their hands on your prompts.
swiftcoder 9 minutes ago|||
> They would have to get their hands on your prompts

Unless you are running a local model, your prompts are almost certainly logged by your inference provider, and would only be a subpoena away?

archargelod 6 hours ago|||
Leaks, whistleblowers. Some circumstantial evidence will also do if there's enough of it. Like having hallucinated parts of code that do absolutely nothing, and can't be explained as e.g. leftovers from a refactor.
wayeq 12 hours ago|||
read the article?
DrewADesign 12 hours ago|||
But it means that the appellate decision will retain precedence, no? Wouldn’t losing precedence be the primary legal effect of overturning that decision? All case law that hasn’t touched the Supreme Court could theoretically be challenged, but most of it isn’t, and it’s considered the law until it isn’t anymore, right? How would this be any different?
semiquaver 11 hours ago||
The decision is binding only within the jurisdiction of the Court of Appeals for the D.C. Circuit.

So it’s not correct to say “because SCOTUS denied cert, Thaler is now binding national copyright law.”

Practically speaking, it is binding on the US Copyright office (one of the parties in the case) in CADC. And that’s important. But copyright litigation happens all across the country, while this ruling only directly constrains the relatively small number of cases within CADC.

DrewADesign 9 hours ago||
Yes, I didn’t imply national precedence. I imagine it would also signal to attorneys appealing cases other circuits that the same challenge will likely yield the same result.
senaevren 16 hours ago|||
Fair and correct. Cert denial means the Court declined to hear the case, not that it endorsed the lower court's reasoning or settled the question nationally. The DC Circuit ruling stands and the Copyright Office's position is consistent, but that is stable doctrine rather than Supreme Court-settled law. Updated the piece to reflect this distinction accurately.
sowbug 15 hours ago||
Since this is a tech audience... the Supreme Court uses a bounded priority queue. An unbounded queue would risk growing impractically large.

There are some kinds of cases where the Court has "original jurisdiction," meaning they must hear them, but those are very rare.

freejazz 15 hours ago|||
It does settle the law in as far as maintaining the status quo.
jmyeet 16 hours ago||
The Supreme Court declining to take up an issue is taking a position.

Now different circuits can take a different view of the same issue. This is a common reason why the Supreme Court will grant cert: to resolve a circuit split. Appeals court judges know this and have at times (allegedly) intentnionally split to force an issue to the Supreme Court.

Even without settling the issue appeals courts will look at how other circuits have ruled and be guided by their reasoning, generally. The fact that the Supreme Court declined to grant cert actually carries weight.

semiquaver 15 hours ago|||

  > The Supreme Court declining to take up an issue is taking a position.
No it is not.

  > “The denial of a writ of certiorari imports no expression of opinion upon the merits of the case, as the bar has been told many times.”
United States v. Carver, 260 U. S. 482, 490 (1923).

Moreover, SCOTUS does not decide issues, they decide cases.

  > “We are acutely aware, however, that we sit to decide concrete cases, and not abstract propositions of law.”
Upjohn Co. v. United States, 449 U. S. 383, 386 (1981).
greensoap 15 hours ago||||
the real issue is that the Thaler case was a different question: "Can AI be an author?" and the lower Court said no and SCOTUS left it along. But the question of "what is enough for the human to be the author" wasn't even part of the case. That is completely own checked.
jongjong 6 hours ago|||
Logically, I think there's a big difference between code which was produced from a simple generic prompt without other input vs code which was produced from a multiple complex prompts with large existing code as input.

When I'm feeding AI my code as input and it ends up producing new code which adheres to my architecture, my coding style and my detailed technical requirements, the copyright over the output should be mine since the code looks exactly like what I would have produced by hand, there is no creative input from the AI. It's just a code completion tool to save time.

I understand if someone leaves an LLM running as an agent for multiple days and it produces a whole bunch of code, then it's a very different process.

senaevren 15 hours ago|||
Fair point and worth being precise about. Cert denial is not meaningless: it leaves the lower court ruling intact, it signals the Court did not find the issue urgent enough to resolve now, and as you note, other circuits will look at the DC Circuit's reasoning. What it does not do is bind other circuits or establish Supreme Court precedent. The distinction matters here because if a Ninth Circuit case involving AI-generated code reaches a different conclusion, that circuit split would be live law regardless of the Thaler cert denial.
Arcuru 18 hours ago||
Personally, I think that the human directing the agent owns the copyright for whatever is produced, but the ability for the agent to build it in the first place is based off of stolen IP.

I'm concerned about the copyright 'washing' this enables though, especially in OSS, and I think the right thing for OSS devs to do is to try to publish resulting code with the strongest copyleft licensing that they are comfortable with - https://jackson.dev/post/moral-ai-licensing/

nadermx 17 hours ago||
Funny how the copyright industry was able to spin copyright infringment into the pejorative "stealing". If you still have the item, what was stolen?

Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act

tensor 17 hours ago|||
I still find the idea that "learning" from code is "stealing" kind of ridiculous.
array_key_first 12 hours ago|||
The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.

It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.

tensor 8 hours ago|||
It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.
leereeves 34 minutes ago||
It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2. Any law restricting that would be the worst form of tyranny.

It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)

aeon_ai 11 hours ago|||
If one defines 'flying' to be a bird's endeavor, then humans can't fly.

Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.

greendestiny 10 hours ago||
Sure as a word it can be broad, as a concept in our legal system that should be much more nuanced.

The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?

nadermx 9 hours ago||
Relevant: https://www.bluewin.ch/en/news/swiss-company-builds-airport-...
boh 15 hours ago||||
Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

The mental calisthenics required to justify this stuff must be exhausting.

idle_zealot 15 hours ago|||
> The mental calisthenics required to justify this stuff must be exhausting.

It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.

balamatom 4 hours ago||
Correct.

Knowledge consists of, roughly speaking, thoughts.

(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)

The "thinking" part of a "thinking being" - that also consists of thoughts.

If your knowledges are someone's property, you are someone's property.

A society where all knowledge is proprietary, is a society of ubiquitous slavery.

Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.

Doesn't matter.

spankalee 10 hours ago|||
Humans have been known to recite entire parts from plays from memory, live in front of audiences even.
leni536 4 hours ago||
And they are legally required to license the play to do that, if it's still in copyright.
spankalee 51 minutes ago||
Only to perform it, not learn it.
leni536 25 minutes ago||
And LLMs perform when you prompt them.
greendestiny 10 hours ago||||
I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".

alok-g 6 hours ago||
>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.

greendestiny 6 hours ago||
I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.

Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.

alok-g 5 hours ago||
The part I agree to is that copyright law calls out humans specifically as the potential owners of copyright. So what you suggest seems to be the only possibility out. Calling out humans could imply that when a human reads a thousand books and then writes something basis the same but which is not a substantial copy of anything explicitly read, that human owns the copyright to the text written. Whereas, if an artificial neural network does the same (hypothetically writing the same text), it would not.

The above does not follow from, imply or conclude anything about learning in artificial neural networks and humans being similar or dissimilar.

nkrisc 15 hours ago||||
I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.
lo_zamoyski 16 hours ago||||
If there were the case, then imagine having to give it back!
estimator7292 16 hours ago||||
Learning, probably not.

Copy/pasting at scale, yes

vorticalbox 16 hours ago|||
It is learning though. It’s not just copying the code.

Code gets turned into tokens and then it learns the next most likely token.

The issue that I see most people talk about it the scale at which is learnt.

A human will learn from other people’s code but not from every persons code.

cogman10 16 hours ago|||
The issue is that of copyright law WRT to derivative works. Machine transformations on original works does not create a new copyright for the person that directed the machine transformation. That's why you can't pirate a bunch of media by simply adding a red pixel to the righthand corner or by color shifting the video.

Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.

It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."

I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.

red75prime 14 hours ago||
Shredding is a machine transformation. Does it mean that shreds retain original copyright even if the content can't be restored and the provenance can't be traced? Just an example that treating all machine transformations equally with no regard to the specifics doesn't make much sense.

And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.

cogman10 14 hours ago||
> Does it mean that shreds retain original copyright even if the content can't be restored?

Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.

The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.

> And the specifics of autoregressive pretraining is that it is lossy compression.

That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.

> Good luck finding which copyrighted materials have made it into the final weights.

That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.

Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.

red75prime 14 hours ago||
> Yup, it absolutely does

Well, it's not the first time when the law contradicts laws of nature (for the entertainment of the future generations). Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

> in fact, pull out full NYT articles

That's when they used their knowledge of the exact text they wanted to "retrieve" to get the text? It wouldn't be so efficient with a random number generator, but it's doable.

cogman10 13 hours ago||
> Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.

You can restore shredded documents with enough time and effort. And if you did that and started making photo copies, even if they are incomplete, you will run afoul of copyright law.

Bittorrent is a relevant example because it shows that shredding doesn't destroy copyright.

Remember, copyright is about the right to copy something. Simply shredding or destroying a thing isn't applicable to copyright. Nor is giving that thing away. What's applicable is when you start to actually copy the thing.

red75prime 13 hours ago||
I've meant idealized shredding: a destructive transformation, which is still a machine transformation (think blender instead of shredder). When you need the exact knowledge of a thing to make its (imperfect) copy using some mechanism, it doesn't mean that the mechanism violates copyright.

EDIT: I don't say that neural networks can't rote learn extensive passages (it's an effect of data duplication). I'm saying that they are not designed to do that and it's possible to prevent that (as demonstrated by the latest models).

cogman10 11 hours ago||
I'd assume it's still a copyright violation if you copied and distributed the shredded copy.

The way I arrive at that is imagine you add just 1 pixel of static to a video, that'd still be a copyright violation. Now imagine you slowly keep adding those random pixels. Eventually you get to the point where the whole video is just static, but at some point it wasn't.

Now, would any media company or court sue over that? Probably not. But I believe that still falls under copy right (but maybe fair use?).

The issue with neural networks is they aren't people. Even when you point your LLM at a website and say "summarize this" the output of that summation would be owned by the website itself by nature of it being a machine transformed work.

Remembered, it's not just mere rote recitation which violates the law, any transformation counts as well. The fact that AI companies are preventing it doesn't really solve the problem that they are in fact transforming multiple copyrighted works into their responses.

red75prime 8 hours ago||
When you point your browser at a website the browser creates a (transformed) local copy of the information that is owned by the website itself. The browser needs to do that to render the website on your screen. Is it a violation of copyright (that the website is willing to tolerate because it profits from advertisements)?
cogman10 3 hours ago||
No, because your browser is dealing with the distribution of data in a way intended by the copyright holder. You also aren't redistributing the webpage after rendering. Client side modifications fall under fair use which is what keeps the likes of ad blockers and other page modifiers legal.

What would violate copyright is if you took that rendered page, turned it into a jpeg, and then hosted that jpeg from your own servers. That's the copying that would run afowl of copyright law.

blks 15 hours ago|||
A human is not a commercial product. Here we have commercial product that was created by using a lot of various copyrighted and protected IP, without licensing agreements, without paying, without even citing it.
margalabargala 16 hours ago|||
Copy/pasting at scale is how tons of software has been written for a long time, or have we all forgotten the jokes people used to make about StackOverflow?
pydry 16 hours ago||||
If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".

I have seen LLMs do all sorts of crap which was clearly reproduction of training material.

This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.

jakeydus 15 hours ago||
Remember last year (?) when one of the major AIs produced a bit of code that included Jeff Geerling's name in a comment?
MagicMoonlight 14 hours ago|||
If I “learned” your essay and handed it in, would you be happy with that?
NewsaHackO 11 hours ago||||
Everybody has had a complete 180 in terms of copyright protections. Before, nobody cared about downloading music, movies, TV shows, or pirating games. Now, when the copyright law is affecting them, they are gungho about protecting these billion-dollar companies' copyrights.
jeppester 4 hours ago|||
A more logical explanation would be that there are different opinions and those who complain are usually louder.
preisschild 3 hours ago|||
Its not about "billion-dollar companies' copyrights", but also about voluntary copyleft free software. If I license my code under GPL I don't want other persons/companies just whitewash that code through LLMs and use it in their proprietary code.
Neywiny 17 hours ago||||
I don't think it's unreasonable to consider it stolen potential profit, but agreed that's not how they spin it
blks 15 hours ago||||
“Stolen” as in “profited on IP against terms and conditions of the license”.
thesmtsolver2 16 hours ago||||
[dead]
themafia 17 hours ago|||
[dead]
Aerroon 16 hours ago|||
Copyright isn't some natural state of being though, it's something that's granted to people by the government to "promote the progress of science and useful arts". If copyright hinders things then I think it's reasonable that exceptions would be made.
rectang 16 hours ago|||
Copyright laundering is an illusion.

If the LLM generates output that a court decides is sufficiently derivative, and especially (but not necessarily) if the LLM was trained on the source material being infringed, then whoever redistributes the derivative output is going to be liable for copyright infringement.

Creation of the LLM itself is transformative, but LLM output which infringes is not.

2ndorderthought 15 hours ago||
Is it true then that if someone stole an entire code base from a vibe coded app from a non permissively licensed project and that person claimed that it was derived from an LLM and was not stolen at all that the person who stole the code is not a thief because it came from the same place? Or are they a thief because someone else copyrighted it? How do vibe coders protect themselves not knowing who else has the same derivative code or who holds the copyright first? Or can't they?
leptons 13 hours ago||
The only thing a vibe coder should be able to copyright, is the prompt text they wrote. Not the output of the LLM, only the text they wrote to instruct the LLM what to do. And even that is pretty iffy, because most of it like "put a button on a page" is not copyright-able.
KallDrexx 11 hours ago|||
Do you think that human directing the agent owns copyright for any legal reason?

The case Community for Creative Non Violence Vs Reid (https://en.wikipedia.org/wiki/Community_for_Creative_Non-Vio...) solidifies a supreme court opinion that someone contracting a work and directing an author does not grant authorship to the commissioner of the work, it grants authorship to the person actually doing the work.

The author can grant authorship and copyright to the commissioner with a contract, but the monkey picture (and others) have solidified that only humans can be granted copyright. Since LLMs aren't human they can't hold copyright, and if the LLM doesn't have legal copyright then they don't have legal rights to assign copyright to you.

zarzavat 5 hours ago|||
It depends on what level of creative control you had over the code.

Code is protected by copyright as a literary work. The method is not protected by copyright, that would be the domain of patents. What's protected are the words.

If you say "Claude, build me a website about X" then you do not have any creative control over the literary work Claude is producing. You just told a machine to write it for you. Nor, like a compiler, is it derivative of any other work that you wrote.

If, on the other hand, you are working jointly with Claude to make specific changes to the code on a line-by-line basis, then you will have no problem claiming copyright over the code. Claude in this case is acting as a tool, but there's still a human making decisions about the code.

In the case where you wrote a bunch of markdown and then told Claude to generate the corresponding code but didn't have any involvement in writing the code itself, you could perhaps claim that the code is a derivative work of the markdown, a court would have to handle that case-by-case basis and evaluate how much control you exerted over the work.

Animats 4 hours ago||||
> only humans can be granted copyright.

No, a copyright application can be filed with a corporation listed as the author. Watch for the copyright notice at the end of the next major movie you see.

chrisred 4 hours ago||
However, until very recently the creative product must have been created by someone so there is an implicitly created copyright over the product in the first place. With AI output, that might not continue to be true, we don't really know how it'll work out yet.

In any case, the corporation did not create the product, people created it and their contractual relationship with the corporation defined how the ownership of that work was managed. So, I don't find it too unusual that this element of personhood is available to corporations.

marcus_holmes 6 hours ago|||
Interesting, though, that ownership of the code can still be transferred to the employer. So it's in the public domain (because not human authored) but owned by the employer (because the human and/or LLM was employed by the employer)? I don't really understand how this works.
p_l 4 hours ago|||
Copyright works on derivative rules - is the component of the work unmistakenly derived from another copyrighted work.

Under at least EU AI Act, any work done by AI is not granted copyright. But it does not mean copyright does not apply, it means the amount of work credited to AI is set at 0% (simplification). A human working off another's work unless it's perfect copy will have "credit" for changes that are judged creative/transformative, meaning a human plagiarizing something still can claim to have some degree of authorship. An AI won't.

In a sense, the copyright status of final work is a sort of "sum with dilution" were each work involved adds to claims, but AI's output is set at 0 - the prompt or further rework by human is not.

As for employer, details vary but generally "work for hire" rules and contracts do reassignment of material rights (in EU and some other places you can not reassign moral rights which are a different thing).

alok-g 6 hours ago|||
Note: IANAL

I think what this means is that the employee may not be the copyright owner for multiple reasons, which are possibly applicable simultaneously. It does not imply that the employer owns copyright over the work that is in public domain, which would be a contradiction.

CWuestefeld 17 hours ago|||
but the ability for the agent to build it in the first place is based off of stolen IP.

I honestly don't understand why the attitude that underlies this is so prevalent.

When I write code, what I write and how I write it is informed by having read countless source code files over my education and my career. Just as I ingest all that experience to fine-tune how my later code is written, so does the LLM from the code it's seen.

The immediate retort to that is that the LLM is looking at code that wasn't its to read. But I don't think that's a valid objection. Pretty much by definition, everything I've learned from has a copyright on it, and other than my own code on my own time, that copyright is owned by someone else. Much of the code that's built up my understanding has been protected by NDA, or even defense-department classifications: it wasn't mine in any way. But it still informs how I do all my future coding.

By analogy: I'm also an artist, especially since my retirement. My approach to photography was influenced by Ansel Adams, and countless other artists whose works I've seen displayed in museums, or in publications and online. My current approach to painting was inspired by Bob Ross and others, and the teachers who have helped me develop. I've taken pieces of what I've seen in all their work, and all of that comes out in my photos and paintings, to varying degrees.

I've taken ideas from others in code and in art, and produced something (hopefully!) different by combining those bits with my own perspective. I don't think anyone has a claim on my product because of this relationship.

Likewise, I know that many of my successors have learned from my code (heck, I led teams, wrote one book about software development!). And I hope that someday my artwork has developed to the point where there's something in it that's worth someone else's attention to assimilate. I've never for a minute - even decades before the advent of LLMs - hoped or even imagined that my work would remain locked up with me, and that the ideas would follow me to the grave.

As they say, we are all standing on the shoulders of giants. None of us would be able to achieve the tiniest fraction of what we have, without assimilating what has come before us. Through many layers of inheritance it's constantly being incorporated in subsequent works.

In a few decades at best, I'll be dead. It probably won't be very long after that when people even forget my name. But the idea that something I've done - my work in developing software systems, or in my photography and painting - will continue to have ripples through time, inspires me and gives me hope that I'll have some tiny shred of immortality beyond my personal demise.

demorro 16 hours ago|||
Humans should have more legal privileges than machines, just as individuals should have more legal privileges than corporations. It's really as simple as that. I don't want to gripe around making up justifications, that's how the law should be and if it turns out not to be that, I'm going to be nettled.

I live in the UK, and most US law is based upon English common law, it's not some immutable code given to us from above. It's based upon assumptions and capabilities of the entities participating in the system at the time the law was codified. It can and should change to make more sense if those assumptions and capabilities shift massively.

idle_zealot 15 hours ago||
I get the individual/corporation distinction, but how is a machine another tier here? It's a tool, it can't have any rights at all. The wielder has rights, and curtailing their rights depending on what tool they're using to exercise them seems strange. Potentially justifiable, but it's a different axis from the nature of the actor.
demorro 14 hours ago||
Our positions are completely compatible. People are anthropomorphizing LLMs, saying that because humans train on protected works, then it is fine for LLMs to do the same.

If they have only the rights that their human creators have, then access to them cannot be sold, in the exact same way that I cannot sell you a database that I have collected filled with copyrighted material. The "humans do training too" argument only holds if you imbue LLMs with similar rights to humans.

I am allowed to sell myself (in a very limited capacity) to others for them to exploit my training, even if that training was on protected material, which is a privilege humans should have, but machines should not.

p_l 4 hours ago||
Thing is, LLMs level of compression of training set mean that effectively, under the same rules that say you cannot sell that database filled with copyright material, the LLM is fine to sell. Because you have to be able to meaningfully trace each claim to final output (weights). For example, for some older stable diffusion model, it was calculated that each individual work addition or removal resulted in about 1-2 bits of change, meaning the same rules would qualify it as not derivative work.

However, because it is an issue with (at least historical) goals of copyright law, the common pattern that is evolving is that AI is not granted copyright of any work it generates, making it a bit of poison pill for some of the egregious ideas of corporate abuse. Not sure if the weights will be considered copyrightable either.

missingcolours 16 hours ago||||
In many of those examples, there is payment to the creator of the works that others are learning from. Authors are paid for their books, when we listen to music on the radio the musician is paid royalties, etc. When you lead a team and mentor junior engineers you're being paid for your time.

The nature of the source material matters though. Training a model on open source software seems perfectly fair - it has explicitly been released to the public, and learning from the code has never been a contested use.

IMO the questions around coding models should be seen as less about LLMs and more as a subset of the conversation about large companies driving immense profits from the work of volunteers on open-source projects, i.e. it's more about open source than AI.

jacquesm 17 hours ago||||
Scale and the ability to generate a livelihood of your creations and/or the ability to control how what you have created is used, for instance, to demand attribution.
atleastoptimal 16 hours ago||||
The attitude is derived from a general animus many have towards AI companies. They resent the efficacy of AI because it devalues individual expertise.

I can't imagine it really justifiable to say that training off data is the same as "stealing", when that same claim, that learned information that a person could retain and reproduce constitutes copyright infringement is the subject of many dystopian narratives, like this one, where once your brain is uploaded to the cloud you have to pay royalties based on every media product you remember.

https://www.youtube.com/watch?v=IFe9wiDfb0E

RealityVoid 13 hours ago|||
This is the answer. People don't like having their livelihood threatened so they kick the thing that threatens it.
mattmanser 16 hours ago|||
Part of how AI works is that it's just really complicated compression, you can get AI to write out Harry Potter novels word for word with the right prompting.

When it picks out a rare bit of code, it will be simply copying that code, illegally, and presenting it without attribution or any licenses which is in fact breaking the law but AI companies are too important for the law to apply to them.

There's been instances where models have spat out comments in code that mention original authors, etc., effectively outing itself as a copyright thief.

There's nothing anyone can do about it, but the suspicion is that the big companies have taken everyone's code on GitHub, without consent, and trained on it.

And now are spitting out big chunks of copyrighted code and presented it as somehow transformed even though all they've actually done is change a few variable names.

It is copyright theft, but because programmers are little people, not Disney, we don't have any recourse.

CWuestefeld 13 hours ago|||
And now are spitting out big chunks of copyrighted code and presented it as somehow transformed even though all they've actually done is change a few variable names.

It's pretty likely that I've done the same thing. I mean, I've written enough CRUD functions in my life, for example, that in all likelihood I'm regurgitating stuff that's a copy, for all practical purposes, of stuff I've done before as work-for-hire for my employer. I'm not stealing intentionally or consciously, but it seems quite likely that it's happening. And that's probably true for many of you, at least that have been in the industry for a while.

winstonwinston 15 hours ago||||
> There's nothing anyone can do about it, but the suspicion is that the big companies have taken everyone's code on GitHub, without consent, and trained on it.

I asked agent X what is the source of training data it generated code from, it couldn’t say. Then I asked why the code implementation is exactly the same as the output of agent Y. It said they were trained on the same ‘high-quality library’, and still couldn’t say which one.

So I guess that’s fine because everyone is doing it.

atleastoptimal 16 hours ago|||
Anthropic was sued successfully for training on books, the law still applies to them

https://www.npr.org/2025/09/05/g-s1-87367/anthropic-authors-...

When I write fizzbuzz do I owe royalties to the inventor of fizzbuzz? Is my brain copyright thieving because I can write out the song lyrics from memory?

veber-alex 6 hours ago|||
They got sued for downloading pirated books and not for using them for training. Huge difference.
blks 15 hours ago|||
I think if you write fizzbuzz and then sell it, without attribution, and it goes against the original fizzbuzz license, then you’re infringing.
blks 15 hours ago||||
You’re confusing yourself with a commercial product. You’re not a product that was created by other human beings based on someone else’s IP.
CWuestefeld 13 hours ago||
You’re not a product that was created by other human beings based on someone else’s IP.

It turns out that's false. We know that genes are patentable; remember back during the Human Genome Project, when there was such a rush to patent them? So genes are IP. (This seems bizarre to me, since they're patenting something that was found just sitting there, but this is what the system says right now.)

Well, two other humans (aka mom and dad) did create me, based on those patentable genes (and most likely including some genes that were, in fact, patented).

I'm not sure what to conclude from all of that, but I do think that it invalidates your argument.

rspeele 16 hours ago||||
For another human being to look at my open source code, learn from it, get inspired by it, appreciate what I did, and let it influence their own creativity would bring me joy. That's why I open sourced it in the first place.

Few people ever actually read open source code, but I'd like to think on the rare occasions they do, they share a connection with the author. I know when I read somebody else's code, for me to understand it I have to be thinking about the problem the same way they were when they wrote it. I feel empathy with them and can sometimes picture the struggle, backtracking, and eureka moments they went through to come up with their solution.

Somehow I don't get the same warm fuzzy feelings about a machine powered by investor money ingesting my work automatically, in milliseconds, and coldly compressing it down to a few nudges on a few weights out of trillions of parameters. All so the machine can produce outputs on-demand for lazy users who will never know of me or appreciate my little contribution, and ultimately for the financial benefit of some billionaires who see me as an obsolete waste of space.

I guess I'm just irrational that way.

all2 15 hours ago||
We're moving into the 'industrial age of software'. You exact issue, of bespoke, well thought out and well-crafted code is one that craftsmen felt at the beginning of the industrial age. Now, parts are designed and churned out by machines that no one sees or cares about (generally speaking). This is where we are going with software, and production at a truly industrial scale has its place.

And so does well-crafted bespoke software.

The engineers who built the foundation for the industrial expansion of our forefathers went through the same exact thing we're going through now. They look at what existed, and use it to inform their efforts. This is what LLMs do.

I'm not attempting to moralize here, just comment on the parallels. Do I agree that a craftman's work is consumed by the juggernauts and no second thought is given? No. I think its a shame. But I also think the output will never match the artisans that practice now. By the very nature of the machines we employ, we cannot match the skill or thought that goes into bespoke code.

rspeele 7 hours ago||
It is not even about quality. In fact with an LLM following my orders I can create higher quality code than I ever did before. I always was operating within a budget whether it was defined by the # of hours my customers were willing to pay for, or the # of hours I was personally willing to invest in a side project. This budget manifested in the form of cut features, limited test coverage, limited documentation, and so on. So given the same budget or even a slightly reduced budget I can actually make higher quality software with slop superpowers.

If I spend 2 hours designing the domain model, 1 hour slopping out a rough implementation, and 5 hours polishing it with a combo of handwritten and vibed refactorings, I will get a better result than if I spent 8 hours writing everything by hand.

So my point is not that vibe software is lower quality, as my experience has shown the opposite. It is simply that the spirit of sharing my work was done with the idea that I was sharing it with others who toiled in the same craft, not sharing for consumption by machine. Not that I ever contributed anything very important to the open source world, that anybody depended on. Just personal projects I thought were neat or educational.

In hindsight I would probably still have open sourced what I did, because I think it's valuable to have on record that I competently programmed stuff before AI even existed, like pre-atomic steel. But I don't know if I will open source any personal code going forward.

====

To put it more succinctly: if somebody "ripped off" my open source code in 2018, I wasn't mad about that. Even if they didn't bother to attribute me, well, at least they saw my stuff, had a human brain cell light up appreciating it, and thought it was worth stealing. I'm flattered. But with LLMs my work can be reappropriated without a single human ever directly knowing or caring about it.

gspr 16 hours ago||||
> When I write code, what I write and how I write it is informed by having read countless source code files over my education and my career. Just as I ingest all that experience to fine-tune how my later code is written, so does the LLM from the code it's seen.

You are presumably human. We have granted humans specific exemptions in copyright law. We have not granted that to LLMs. Why are we so eager to?

p_l 4 hours ago|||
We did not grant human exemptions in copyright law.

We gave certain temporary monopoly on certain uses to humans under rules little understood by laymen even if their livelihood depends on it.

gspr 2 hours ago||
... and from that temporary monopoly humans have exemptions (critique, inspiration, etc.)
RealityVoid 13 hours ago||||
Ok, so I use the LLM. I use the tool. Can I now apply the exemption to me?

Are you telling me that I can use the thing, but I can't use it if I process it through an LLM? It get slippery, fast.

gspr 2 hours ago|||
What's special about LLMs in your argument? When I was an edgy teenager in the 90s, I'd argue that it's not piracy because the DivX representation of the movie isn't bit-for-bit identical to the Hollywood master or whatever. If your reasoning works for LLMs as the tools, surely it also works for video compression.
habinero 9 hours ago|||
No, that's how copyright normally works.

If I write a story, I can put it online. That doesn't mean it's ok to take that story and publish it in an anthology.

CWuestefeld 13 hours ago||||
I'm not sure where in our lawbooks there are laws that specifically target humans to the exclusion of human-operated tools.

There's also a TON of irony here. What an about face it is, for the community at large* to switch from "information wants to be free, we support copyleft and FOSS" to leaning so heavily on an incredibly conservative reading of IP law.

lelanthran 5 hours ago|||
> I'm not sure where in our lawbooks there are laws that specifically target humans to the exclusion of human-operated tools.

It doesn't need to. Laws are for humans.

Laws don't give rights to chainsaws. Or lawnmowers. Or kitchen knives, hammers, screwdrivers, and spades.

You can't use any of those to commit a crime and then claim that the law specifically did not exclude those tools.

Why are you seemingly in favour of carving out an exemption for LLMs?

Laws are for humans.

Arguing that the law did not specifically address "intentionally killing a person by tickling them till they died" means that you found a loophole which can be used to kill people is...

well, it's in the "not even wrong" category...

gspr 2 hours ago|||
> I'm not sure where in our lawbooks there are laws that specifically target humans to the exclusion of human-operated tools.

If we take the point of view that LLMs are tools (I agree), then people need to be absolutely certain that these tools don't contain (compressed) representations of copyrighted works.

People seem not to want to do that. And they argue that the LLMs have "learned" or "been inspired" by the copyrighted works, which is OK for humans.

This is the problem. People can't even agree on which of two mutually exclusive defenses to appeal to! Are LLMs tools which we have to ensure aren't used to reproduce copyrighted work without permission, or are they entities that can be granted exemptions like humans can? It can't be both!

> There's also a TON of irony here. What an about face it is, for the community at large* to switch from "information wants to be free, we support copyleft and FOSS" to leaning so heavily on an incredibly conservative reading of IP law.

True. While IP-owning companies like Microsoft now say "it's online, so we can use it".

It's bizarre.

I'll tell you what: I'll drop my conservative stance in defense if FOSS when Windows and the latest Hollywood movie are "fair use" for consumption by whatever LLM I cook up.

ako 15 hours ago|||
Because that allows us to create useful tools that we didn't have before. For me it feels like a carpenter going from a hand-saw to an electrical saw. Still requires the skills of a good carpenter, but faster and easier.
gspr 15 hours ago||
… so a bunch of people just decided that rights we granted to humans also apply to their tools? Without any discussion? This isn't how anything is supposed to work when it comes to common rules!
RealityVoid 13 hours ago||
The common rules are so because we agree on them. On principle, in this case, we do not agree what the rule should be here and it's in a way unprecedented. We'll soon converge to a societal agreement. I hope society abstaining itself from tools will not be the answer.
gspr 2 hours ago||
And the process by which we agree is lawmaking.
ako 16 hours ago|||
I've created my own DSL, and instruct Claude Code how to generate code for this DSL using skills.

Since this is a new language, and not documented on the web nor on Github, Claude's ability is not based off of stolen IP. At best it's trained on other language concepts, just like we can train ourselves on code on GitHub.

Maybe a good reason to create a new programming language?

alok-g 6 hours ago||
Interesting, but I still do not think this is as easy. The AI model is still trained on some existing works, and it is generating code in the new DSL or programming language still based some higher level ideas and expressions it has consumed during training. You have added just one more level of indirection. The output cannot anymore be verbatim copy of some existing work or non-short snippets, however, the output may still carry "expression" that are substantially similar to something pre-existing.

Note: IANAL. The above is just from my current understanding.

amelius 3 hours ago|||
I wonder what OSS licenses would have looked like if we saw all of this coming.
cess11 53 minutes ago|||
The LLM is just a database. It's like saying 'I own the copyright to what comes out of an API because I crafted the query' or 'I own the copyright to the responses I get from the bots on the Starship Titanic because I crafted the message they respond to'.
amarant 16 hours ago|||
I could possibly see an argument for the owner being whoever paid for the tokens used, but honestly I think the argument for that is weaker than what you're suggesting; I'm merely playing devil's advocate here.

I don't think there's even a valid argument for any other ownership model, or at least none that I can think of.

jmaw 16 hours ago||
I see the argument for whoever paid for the tokens. Or in the case of a free AI usage, the person who sent the prompt (or whoever they are acting on behalf of, i.e. the company they are working for at the time).

The primary issue being that it's all built on stolen data in the first place.

pc86 15 hours ago||
Even taking the least generous interpretation of what LLMs do and saying they're just "copy/pasting others' code" it's still not stealing because the original still exists and presumably still makes money. The original has to be gone for theft to have occurred.

In order to have a sane conversation about this we have to all agree not to lie.

jacquesm 17 hours ago|||
No, that human owns the copyright on the prompt, not on the work product.
alok-g 6 hours ago|||
If that were true, a developer may own copyright over the source code, but nothing on the compiled binaries, and I could download practically all software available as compiled binaries and use for free.
jacquesm 5 hours ago|||
Indeed a developer owns copyright over the source code and on the compiled binaries, because there is no expansion happening here but just a translation from one format into another, the kind of thing that has been ruled copyrightable since copyright exists. The same goes for translations from one human language into another, and anybody with knowledge of more than one language will be happy to acknowledge that translating is hard work. Even so, the translator does not hold copyright on the result, at best they can say they have created a derived work and it is the original author that continues to hold copyright.

Compilation and translation happen in a generic manner and does not rely on a mountain of other IP, it is really just a transformative tool that happens to do something useful, someone constructed it to be a very precise translation to the point that any mistakes in it are called bugs and we fix them to ensure the process stays deterministic. Translators try hard to 'get it right' too: to affect the intentions of the original author as little as possible.

When you use a model loaded up with noise or that you have trained exclusively on code that you actually wrote I think a strong case could be made that you own the copyright on that work product. But when you train that model on other people's work, especially without their consent or use a model that has been trained in that way you lose your right to call the output of that model yours.

You did not write it, and the transformative process requires terabytes of other people's IP and only a little bit by you.

As soon as you can prove that your contribution substantially outweighs the amount of IP contributed in total you would have a much stronger case.

alok-g 4 hours ago|||
>> No, that human owns the copyright on the prompt, not on the work product.

I think I may have misunderstood your original comment above. It seems intending to say:

No, that human owns the copyright on the prompt, not necessarily on the work product. The human may partially have copyright over the work product as well, "how much" being dependent on how much new creative expression from the human was involved vs that from others.

p_l 4 hours ago||
That is in fact correct.

Both the compiler (in absence of inclusion of copyrighted libraries) and the LLM are considered to not add creative work and thus do not change copyright status of the works they transform.

You can consider the training set of the LLM or other AI model to be 3rd party libraries and the level of copyright from them applying to final output to be how much can be directly considered derivative, just as reading copyrighted code and being inspired by it does not pass that copyright to your work unless it's obviously derivative

alok-g 4 hours ago||
>> You can consider the training set of the LLM or other AI model to be 3rd party libraries ...

I like this comparison -- training set as '3rd party libraries'. Except, of course, that the authors behind the training set may not have actually granted permission to use, whereas the 3rd party libraries usually have some permission by way of license.

alok-g 4 hours ago|||
+1

Adding two subtle points:

>> Indeed a developer owns copyright over the source code and on the compiled binaries, because there is no expansion happening here but just a translation from one format into another ... does not rely on a mountain of other IP

... and, the license agreement of the compiler and libraries used / linked to practically always explicitly waive copyrights over the said non-mountain of IP.

>> As soon as you can prove that your contribution substantially outweighs the amount of IP contributed in total you would have a much stronger case.

... a much stronger case that you have a partial copyright over the work, which is now likely a derivative work. You still may not have a case that you own the copyright exclusively (or as the original article says, that your employer does).

lelanthran 4 hours ago|||
> If that were true, a developer may own copyright over the source code, but nothing on the compiled binaries, and I could download practically all software available as compiled binaries and use for free.

If the compiled binaries (output) were produced by running the input (source code) over every program written, then sure.

But that's not what's happening with compilers, is it? The output of a prompt is dependent on copyrighted work of others every single time it is run.

The output of a compiler is not dependent on the copyright output of every other program.

alok-g 4 hours ago||
I think your comments are originating in how I may have taken jacquesm's comment too literally, as I just wrote here https://news.ycombinator.com/item?id=47944938

However:

1. The "every"ies in your comment are not to be taken literally either. :-)

>> If the compiled binaries (output) were produced by running the input (source code) over every program written, then sure.

2. More importantly, the above seems cyclically dependent on whether output from generative AI is deemed to be in public domain or not, which I consider is an open-ended issue as of now. It is not so 'sure' as yet. :-)

keithba 16 hours ago||||
That’s now how it works. The human using the tool (like claude code, etc) owns the copyright of the code generated.
jacquesm 15 hours ago||
No, you are wrong about this.

See:

https://technophilosoph.com/en/2025/02/07/ai-prompts-and-out...

If you have a more recent citation referring to case law that states the opposite then that would be great but afaik this article reflects the current state of affairs.

The human using the tool creates a prompt, there is then an automatic transformation of the prompt into code. Such automatic transformation is generally accepted as not to create a new work (after all, anybody else inputting the same prompt would have a reasonable expectation of generating the same output modulo some noise due to versioning and possibly other local context).

Claud code and in general AI generated code does not at present create a new work. But the prompt, that part which you input may be sufficiently creative to warrant copyright protection.

keithba 15 hours ago||
In the US, the copyright office (as the article you link to says), has declined to define “meaningful” contribution. If you want to argue that the user doesn’t own it for incredibly trivial prompts, I won’t argue (though I consider that to be non-useful code).

Every developer I’ve seen use these tools has have engaged in a meaningful contribution: specific directions across multiple prompts, often (though not always) editing the code afterwards, manually running the code and promoting for changes, etc.

Until the courts, legislators, or the copyright office define something otherwise, I’m highly confident of my assertion. (Mostly because of the insane number of hours I’ve spent with counsel on this. And, as a disclaimer, since I am biased: I worked on Copilot and Google’s various AI assisted coding products as an SVP and VP.)

jacquesm 15 hours ago||
If my business depended on a legal fiction to be true and I had invested a whole pile of effort + money into it being so then I would argue at every opportunity that 'of course it is legal'. But that's just a version of fake-it-until-you-make-it and in practice not all of those bets pay off.

The fact that meaningful contribution has not been defined is a strong signal that things are not nearly as clear cut as you make them out to be. Until there is a ruling that clearly establishes that the person that generated the prompt owns the copyright on the code I think it is misleading to suggest that this is already the case, your lawyers are not the lawyers of the parties that will end up hurt if it ends up not being so.

For contrast: we have a very clear idea on what things are copyrighted and in general these things do not rest on a foundation of IP appropriated from others outside of the license terms. The fact that the infringement is fine grained and effectively harms the rights of 1000s or more individuals doesn't change the heart of the matter, whoever wrote the code: it wasn't you.

Given your bias I'm not surprised that this would be your argument though, effectively you have created a copyright laundromat using code that you were nominally the steward of and not the owner but whether it stands long term or not is not up to your lawyers.

alchemism 12 hours ago|||
Prove I did not write my code if I do not tell you which tools I used. =}
jacquesm 11 hours ago||
That's not how that works.

You warrant you wrote the code yourself, then it is found your code infringes on code owned by other entities. Now you have a tough choice: admit you lied about writing your code yourself tainting all of the code you claim you wrote since these tools became available or stand and take the infringement penalty which could be very substantial.

Judges and courts don't like playing silly games like this.

I've sued two parties for copyright infringement and won and a third settled out of court for a substantial sum. You don't tell a judge you don't need to prove you wrote the code, that's an automatic loss. Then there are such things as expert witnesses who will interview you and check how much you know about the code you claim you wrote.

NewsaHackO 9 hours ago||
>I've sued two parties for copyright infringement and won and a third settled out of court for a substantial sum. You don't tell a judge you don't need to prove you wrote the code, that's an automatic loss. Then there are such things as expert witnesses who will interview you and check how much you know about the code you claim you wrote.

This doesn't really make sense; in no way can an "expert" interview definitively assert someone wrote a piece of code or not, especially if the person has access to the code beforehand.

habinero 9 hours ago||
They don't need to prove it 100%. They just have to show that it's likely you did.

I believe the standard can be as low as "more likely than not".

keithba 15 hours ago|||
Obviously, we aren’t going to agree on this at all. I hope you have a good day.
kridsdale1 16 hours ago|||
So I’m responsible for pushing the giant boulder at the top of the hill.

The humans at the bottom who were crushed should blame the boulder, which happened to be moving.

jacquesm 16 hours ago||
I'm not sure what point you are trying to make.
Aerroon 16 hours ago||
He's making a point about responsibility/liability.

If you only get copyright for the prompt you make, but not the output, then it's like being responsible only for the prompt, but not the output.

Ie he's only responsible for pushing the boulder up the hill. The fact that it rolled down from the hill and crushed someone's house "isn't his fault" (he doesn't get copyright on it).

generic92034 15 hours ago|||
That is not how responsibility works anywhere. If you are stealing a gun and murder someone with that gun, you are still responsible, even if it is not your gun.
jacquesm 15 hours ago|||
Well, you are responsible for the consequences. Liability is simply a different thing than copyright.
Aerroon 15 hours ago||
The copyright office says that you don't get copyright because you're not considered the author:

https://www.copyright.gov/ai/

>The Office concludes that, given current generally available technology, prompts alone do not provide sufficient human control to make users of an AI system the authors of the output. Prompts essentially function as instructions that convey unprotectible ideas. While highly detailed prompts could contain the user’s desired expressive elements, at present they do not control how the AI system processes them in generating the output.

If you're not the author then why would you have to be liable for it?

jacquesm 14 hours ago|||
> If you're not the author then why would you have to be liable for it?

If you do not understand this make sure that you always operate within a framework of people who do because this soft of misunderstanding can cause you a world of grief.

Because you are the person shipping it, and as such regular liability applies. If I'm not the author of a book, and make a lot of copies and distribute those I'm liable for the content of that book, regardless of whether or not I hold the copyright to it. Conversely, if the original author sues because they feel their work infringes then that too is a liability that stems from the distribution.

And 'distribution' is a pretty wide term, not unlike 'interstate commerce', lots of things that you might not consider to be distribution can be classified as such in court.

Different laws do not come in packages, they apply individually, and sometimes they apply collectively but it isn't a menu where you can pick the combination that you think makes the most sense.

Aerroon 6 hours ago||
Oh, I do understand it - laws are contradictory and can do whatever people shout out the most that they should do (but they don't always work that way). I just think that it is extremely bad when laws work this way.

Technically when you select "copy image" instead of "copy image url" and paste that to a friend you're often committing copyright infringement. Do I think this is reasonable? Absolutely not. The same goes for this - the author should hold liability, so make the person who ends up causing the work to exist the damn author.

But nooo, we can't have that. Instead we need to have these convoluted exceptions that don't at all work how the real world works, so that lawyers can have even more work.

Besides, if we go by "the law" then we already have a court case where training an AI model is protected by fair use. But obviously that isn't satisfying enough for people, so they keep talking about how it's stealing (refer to my first sentence).

Also, this situation is going to get funny when some country decides that AI generated content does get copyright protection.

lelanthran 4 hours ago|||
> Oh, I do understand it - laws are contradictory and can do whatever people shout out the most that they should do (but they don't always work that way). I just think that it is extremely bad when laws work this way.

You are completely misunderstanding GP's distinction between ownership and liability.

In short, if you use someone else's car to kill someone, you are still liable for killing that person even though you don't own the car.

Do you disagree with that statement?

jacquesm 4 hours ago|||
You can't really argue that things are in a certain way when that contradicts the way the law works, that's a recipe for disaster. The rules have been set, you can disagree with them and then you will be forced to litigate, which is both expensive and time consuming. Purposefully going against the grain is only for those with extremely deep pockets (and for lawyers...).

> Besides, if we go by "the law" then we already have a court case where training an AI model is protected by fair use.

Yes, but training an AI is a completely different thing than distributing the work product generated by that AI.

Note that I don't agree with all aspects of copyright law either, but I'll be happy to play by the rules as set today simply because I can't afford to be wrong and held liable for infringement. For instance I strongly believe that the length of copyright is a problem (and don't get me started on patents, especially on software). I also believe that only the original author should have copyright, not the company they worked for, their heirs (see Ravel for a really nasty case) or anybody else. I believe they should not be transferable at all.

But because I'm a nobody and not wealthy enough to challenge the likes of Disney in court I play by the rules.

As for 'this situation is going to get funny when some country decides that AI generated content does get copyright protection':

Copyright is one of the most harmonized legislative constructs in the world. Almost every country has adopted it, often without meaningful change. In practice US courts are obviously a very important driver behind changes in copyright law. But in general these changes tend to lean towards more protection for copyright owners, not less. So far the Trump admin has not touched copyright law in their usual heavy handed manner. I'm not sure if this is by design or by accident but maybe there are lines that even they can not easily cross without massive consequences.

Some parties in the AI/Copyright debate are talking about two sides of their mouth, for instance, Microsoft is heavily relying on being able to infringe on copyright at will but at the same time they are jealously guarding their own code. Such hypocrisy is going to be the main wedge that those in favor of strong copyright are going to use to reduce the chances that AI work product deserves copyright, after all, if it is original and not transformative then Microsoft could (and should!) train their AI on their own confidential code. But they're not doing that, maybe they know something you and I do not...

teddyh 15 hours ago|||
If you hold an illegal party on public land, you would still be liable, even though you did not own the land.
Aerroon 6 hours ago|||
But that's not at all a comparable situation though, because it is your party. It doesn't matter where it is, we assign "ownership" of the party to you. Even the language we use explicitly states that. In the case of copyright, we explicitly states (by the copyright office), that you are not the author of an AI generated work.

Same point goes to if an animal takes a picture.

jacquesm 14 hours ago|||
In some places simply not keeping the public street in front of property ice-free can incur liability, even when you are not actually there when it snows. There are so many such examples I'm kind of surprised to see this kind of confused argument made here.
saadn92 16 hours ago|||
I agree with this sentiment, because the person directing the agent can still direct it in a way where it'll produce a better or worse output than another person directing it.
jongjong 6 hours ago|||
This interpretation makes sense. I think even the 'fair use' clause in the US doesn't protect LLMs. One argument I've heard often is that LLMs synthesize their training set to produce novel output in the same way as a human would... That may be the case, but legally an LLM isn't a human. You can't look at the output of an LLM and say that it's 'fair use' with respect to its training set; it hasn't been established that AI has the same 'fair use' right as a human does; it's already pushing it that companies have this right (let alone an AI agent); anyway, that's just one problem... Also, this is ignoring the fact that the researchers who compiled the training set COPIED the original copyrighted data in order to produce that training set. They either copied the entire work into the training set or they fed the entire work directly into the LLM; in either case; at some point, the entire work was copied verbatim into the LLM's input layer before it was ingested by the AI. The researchers copied the copyrighted content without permission.

Also, when it comes to code, the case is even more damning because the vast majority of the code which LLMs are trained on was not only copyright but subject to an MIT license (at best) and even the MIT license, which is the most permissive license in existence, still says clearly:

"Permission is hereby granted, free of charge, to any person obtaining a copy of this software"

The word 'person' is used very intentionally here.

I think there should be several kinds of AI taxes which should be distributed to all copyright holders. There should be a tax to go to writers (and book authors), a tax to go to open source developers and a tax for the general population to distribute as UBI to account for small-form content like comments and photography...

People invested a lot of time building their entire careers around the assumption of copyright protection; so for it to be violated on such a scale would be a massive betrayal.

varispeed 17 hours ago|||
I find idea that the code could be copyrightable as weak. There are only so many ways to write a for loop. Similarly you can't copyright schematics (apart from exact visual representation as form of art). Code is just a schematic.
alok-g 6 hours ago||
Note: IANAL

Copyrights already preclude short phrases for the same reason -- there are only so many ways in which short phrases could be produced. The moment a work becomes larger (large enough; AFAIK, the threshold is not precisely defined), the reasoning you applied fails to apply.

The Google-Oracle lawsuit did not decide whether APIs (when large in number) are copyrightable or not.

jmyeet 14 hours ago||
You can think that's how it should be. But that's not necessarily how it is. I'm reminded of the famous monkey selfie copyright dispute [1]. A photographer set up a camera and gave it to a monkey but after a legal dispute, courts decided nobody owned the copyright.

I can totally see this applying here as well.

Now this doesn't resolve the issue of AIs being trained on copyrighted works it had no rights to. The counterargument is that this is a derivative or transformative work but I don't believe that's settled law at all.

[1]: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

jugg1es 23 hours ago||
I want this question to have an interesting answer, but everyone knows that if this question ever goes to the courts, ownership will go to the people in charge with the money. The idea that Anthropic may not own Claude Code just because Claude wrote it is wishful thinking.
embedding-shape 23 hours ago||
Best part is, it's likely to have a different answer in every country, who knows what'll happen, not every country implicitly sides with the ones with the most money.
MarsIronPI 17 hours ago|||
Well, eventually it'll probably be added to the Berne Convention agreement or some such.
LawnGnome 17 hours ago||
That's my feeling on the endgame too, but it'll probably be a decade before we get anywhere near it.
adrianN 18 hours ago|||
Depends on where they pay their taxes generally.
beej71 21 hours ago|||
I love that genAI art will not be copyrightable and genAI code will be. The power of the Almighty Dollar at work.
senaevren 23 hours ago|||
The work-for-hire doctrine actually supports your intuition more than the AI authorship question does. The reason Anthropic likely owns Claude Code has little to do with whether Claude wrote it and everything to do with the employment contracts of the engineers who directed it. The DMCA takedown question is genuinely interesting though because DMCA requires the claimant to assert copyright ownership in good faith. If a court later found the codebase was predominantly AI-authored and therefore not copyrightable, the 8,000 takedowns could be challenged as bad faith DMCA claims. That is a different and more tractable legal question than the ownership one.
gpm 17 hours ago|||
I have trouble believing that the DMCA claims would be found to be in bad faith when they were made at a time when the question of what degree of human input is required to acquire copyright on AI generate code hasn't been resolved at all.

It doesn't seem like bad faith to think that copyright is stronger than the courts end up thinking, just being mistaken.

senaevren 17 hours ago||
fair correction, updated the piece to reflect this. Bad faith under DMCA requires knowing the claim is false, not merely being wrong. A good faith belief in copyright ownership, even one that turns out to be mistaken, is a defense. The more accurate framing is that if the codebase is found to be predominantly AI-authored, the takedowns would fail on the threshold question of whether there is a valid copyright to assert, which is a different issue from intent.
rasz 22 hours ago||||
Work-for-hire doctrine doesnt automagically absolve you from IP law. Microsoft and Intel already learned this in the nineties when they paid San Francisco Canyon Company to steal Apple code.

https://en.wikipedia.org/wiki/San_Francisco_Canyon_Company

LLMs are just code stealers, will gladly generate Carmacks inverse for you with original comments.

senaevren 17 hours ago||
The San Francisco Canyon case is a good example of exactly the right distinction. Work-for-hire determines who owns the output, but if the process of creating that output involved copying protected material, the infringement claim runs separately. The piece makes this point on the open source contamination section: owning the output and having a clean chain of title to the output are different questions. You can own AI-generated code and still have a copyleft problem in it.
CWuestefeld 16 hours ago|||
I can't see how that can work.

As a developer, the fact that my source code passed through a compiler - an automated tool - doesn't give the author of the compiler any claim on my executable code.

As an artist, the fact that I used, e.g., Rebelle to paint a digital painting, or that I used Lightroom (including generative AI to fill, or other ML/AI tools to de-noise and sharpen my image) in editing a photograph, doesn't give EscapeMotion, Adobe, or Topaz, any claims to my product.

Why, then, would there be any chance that use of a tool like Claude - a tool that's super-advanced to be sure, but at the end of the day operates by way of a mathematical algorithms - would confer any claims to Anthropic?

If a court later found the codebase was predominantly AI-authored and therefore not copyrightable

Is figuring out the appropriate prompts to use in directing Clause qualitatively different than using a (much) higher-level abstraction in coding? That is, there was never any talk as we climbed the abstraction layer from machine code to assembly to Fortran or C to 4GLs to Rust etc., that the assembler/compiler/IDE builder would have any ownership claim on the produced executable. In what sense can Anthropic et al assert that their tool, which just transforms our directives to some lower-level representation, creates ownership of that lower-level representation?

conartist6 23 hours ago|||
It's not wishful thinking, and ownership isn't a foregone conclusion.

Sure the courts could mint a communist society with a few weird decisions about property rights, but this being the US do you really suppose that's likely?

There's really no legal question of any kind that models aren't people and therefore cannot own property (and also cannot enter into legal contract as would be required to reassign the intellectual property they don't and can't own)

wongarsu 23 hours ago||
The catch-22 is that the fact that models aren't people is only relevant if you treat them similar to a person. Like the US Copyright Office's opinion which treats it similar to a freelancer. If you treat the LLM as a machine similar to a camera, with the author expressing their existing intent through the tools of this machine, ownership is back on the table and more or less how it was before LLMs.
conartist6 23 hours ago||
Well if the camera in addition to choosing autoexposure also decided how to frame the shots, which lens to use, where to stand, and everything else salient to the artistry of photography -- all without direct human intervention, then I would think the situation would again be analogous. If the camera could do all that because an intern was holding it, the intern would still own the shots even if their employer gave them the assignment.

That's why the intern signs an employment contract that reassigns their rights to their employer!!

dfxm12 17 hours ago|||
They won't want to own code that is malicious\illegal\used in crime, although it's really weird to me that no one (in LEO) seems to care that, for example, grok generates CSAM, revenge porn, probably other illegal things, so they'll probably get to have their cake and eat it too.
bombcar 16 hours ago||
Those things have precise legal definitions which it may not be entirely clear that an LLM can even generate them - especially in the USA where the 1st covers things that many would think illegal (and are illegal in other countries).
helterskelter 17 hours ago||
I'm not sure Anthropic would appreciate the liability that ownership would imply.
helterskelter 17 hours ago||
Too late to edit, but OpenAI certainly doesn't want ownership or liability, for the CSAM they've produced. They certainly don't want ownership/liability of code which does $ONLYAWFULTHING.
alienll 10 hours ago||
This is the same shape as the image cases.

Zarya of the Dawn already settled it for Midjourney output: human-written elements were protected, AI-generated images were not. The character design didn't get copyright even though the human picked, prompted, and curated. Code isn't different. Prompting Claude to produce a function is closer to prompting Midjourney to produce a frame than to writing the function yourself.

The reason it feels different to engineers is that we're used to thinking of the compiler as the analogy. But a compiler is deterministic — same input, same output. An LLM isn't. That's the line the Copyright Office is drawing, and image cases got there first.

JAlexoid 3 minutes ago||
AFIK: Even the slightest modification of the work is transformative and will produce copyrighted material.

It does not have to be substantial transformation.

protocolture 8 hours ago|||
Depends on the scale of LLM involvement, the copyright office left a pretty big carve out for things that are human sourced and then modified by LLM, or the reverse, LLM output thats modified by human intention. (They had to do this because there are already pseudo random elements to digital artwork, like say, render clouds and render noise, that might otherwise poison an artwork). In fact I dont think this has been tested with Highlight area > Prompt a change to this area of the image workflows.

They also mention in the same document that were LLMs to more closely approximate deterministic tools, they would be open to reevaluating. That is Requesting X gets X without substantial wiggle room.

I dont think that last part has been tested with an extremely large set of prompts and human generated input to create a more deterministic output. Even outside of code, where you see large prompts, creative writing LLM tools, NovelAI or Sudowrite for instance can have pages and pages of spec for the LLM, sometimes close to 50% of the size of the final output.

Then there's testing, review etc, human processes confirming that the output meets spec, updating it where needed intelligently.

There are also foreign courts, with similar rules about human intention, that have found in favor of prompts only, where it could be demonstrated that multiple rounds of prompts were used to refine the image.

I wouldnt call this settled at all tbh. And to be honest, a lot of this doesnt require exposure. you dont need to own up to LLM use in a lot of settings, proving LLM use is so difficult its easy to jump up the ladder from LLM (100%) to LLM (50%) and ultimately claim ownership.

The people who will get busted for this are basically just super lazy leaving ChatGPT responses in, failing to pay an editor, failing to modify images for anything more than layouts.

FrostKiwi 9 hours ago|||
> But a compiler is deterministic — same input, same output. An LLM isn't.

Temperature 0 determinism is subject to active research. NVIDIA tried but failed so far, DeepSeek V4 seems to have done it. I hope judges won't be swayed by this an AI generated code will classified as uncopyrightable, just like Images are.

alienll 8 hours ago||
Fair point on temp-0. But I don't think determinism is what the courts will hang it on. A deterministic LLM still makes the expressive choices — naming, structure, control flow — that the human didn't make. The image cases didn't turn on whether you could re-roll the same Midjourney frame. They turned on who made the creative decisions. Same logic should hold for code.
Onavo 9 hours ago||
But is there anything stopping a human from applying for copyright in their own name? Does the fact that somebody can recreate the prompt invalidate their claim?
SlinkyOnStairs 1 hour ago|||
What you're asking is, "could someone do fraud" and "would being found out invalidate their copyright". To both of which the answer is generally, yes.

It'd be a form of plagiarism, just with different consequences to the most common form.

alienll 8 hours ago|||
Filing isn't the gate, registration is.

Copyright Office requires you to disclose AI involvement and disclaim the AI-generated parts. Zarya of the Dawn is the example — applicant filed for the whole graphic novel, got partial registration on the human-written text, refused on the Midjourney images. The reproducibility of the prompt isn't really the test. The test is whether a human made the expressive choices.

dang 8 hours ago|||
Your comments are getting classified by our software as LLM-generated or (more likely) LLM-edited. It's impossible to be certain, of course, but if this is the case—can you please not do this? It's not allowed here - see https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079.

LLMs are amazing of course and we use them heavily ourselves - but not for modifying text that is to be posted to HN. Doing so leaves imprints on the language that readers are increasingly becoming allergic to, and we want HN to be a place human conversation.

Animats 4 hours ago|||
> Filing isn't the gate, registration is.

Not really. Copyright registration is pretty much automatic. The Copyright Office does not check for duplicates. Patent registration involves actual examination for patentability. Issued patents are presumed valid (less so than they used to be), but issued copyrights are not. You have to litigate.

The US does not have "sweat of the brow" copyrights. It's the "spark" that creates the originality, not the work. Which is why you can't copyright a telephone directory (Feist vs. Rural Telephone) or a copy of an uncopyrighted image (Bridgeman vs. Corel) or a scan of a 3D object (Meshwerks vs. Toyota). Or the contents of a database as a collective work. Note that some EU countries do allow database copyright.

Interestingly, a corporation can be an author for copyright purposes. The movie industry pushed for that. We may in time see AI corporate personhood for IP purposes.

p0w3n3d 23 hours ago||
That's quite impressive approach from the companies' perspective. Let's first use claude code and then we'll think who the code belongs to.

I think that the gold rush approach happening right now around me (my company EMs forcing me to work with claude as fast as possible) show really short-sight of all the management people.

First - I lose my understanding of the code base by relying too much on claude code.

Second - we drop all the good coding practices (like XP, code review etc.) because claude is reviewing claude's code.

Third - we just take a big smelly dump on the teamwork - it's easier and cheaper to let one developer drive the whole change from backend to frontend, despite there are (or were) two different teams - one for FE, one for BE.

Fourth - code commenting was passe, as the code is documentation itself... Unless... there is a problem with the context (which is). So when the people were writing the code, they would not understand the over-engineered code because of their fault. But now we make a step back for our beloved claude because it has small context... It's unfair treatment.

I could go on and on. And all those cultural changes are because of money. So I dub this "goldrush", open my popcorn and see what happens next.

nicoburns 23 hours ago||
> Third - we just take a big smelly dump on the teamwork - it's easier and cheaper to let one developer drive the whole change from backend to frontend, despite there are (or were) two different teams - one for FE, one for BE.

Agree with your other points, but IMO this one has always been better. You often need to design the backend and frontend to work with each other, and that requires a lot more coordination when it's separate teams.

ryandrake 17 hours ago||
One of the few things I do kind of like about LLM-assisted coding is that it's helping to bring back "lone wolf" programming. We currently default to using massive teams to build massive software because of all the work involved, but teams have a huge communication/documentation cost, and a lot can leak and be lost the more communication has to happen to get things done. Code assistants cut down on the "all the work involved" part, and I think will help to bring one-man shops back into fashion.
eddyfromtheblok 10 hours ago|||
people quickly have forgotten: when copilot was announced, there were warnings not to use it for company code because of the license attribution problem. so what's changed? that anthropic is willing to defend and indemnify?
sebastianconcpt 23 hours ago|||
Also, it's supremely easy do the wrong abstractions long term and compromise premature internal designs that will start to starve of human mental modeling, hence explaining with accountability how things work and what the plans are when an incident happens. Also, if the wrong generalizations are introduced, coded correctly and reviewed and approved by AIs, then who's even driving really?
bearjaws 23 hours ago|||
I rarely see #3 yield better solutions, it's usually better to collaborate as a team on requirements and gotchas, but let one person own implementation.
p0w3n3d 14 hours ago||
But both backend and front-end? Do everyone have to be full stack?
refulgentis 15 hours ago|||
I opened my popcorn for the unholy trinity of HN x law x AI, your comment was one of my faves, love the purple prose. :)
senaevren 22 hours ago|||
The fourth point about code commenting is the one that connects directly to the ownership question. When developers write comments to explain intent, those comments are evidence of human creative direction. When Claude writes the code and the comments, and the developer merges without adding their own explanation of the architectural decisions, the record of human authorship disappears along with the institutional knowledge. The documentation problem and the copyright problem are the same problem.
cindyllm 23 hours ago||
[dead]
fsckboy 7 hours ago||
it's well known that recipes cannot be copyrighted. But recipes still are protected intellectual property by trade secret law if they are treated as a secret by the holder of the recipe.

Claude code itself is a trade secret, and it is not open source, so its own copyrightability is moot till you get your hands on a copy of it with clean hands.

Recipes cannot be copyrighted because they are not expressions of human creativity. Software written by AIs are also not expressions of human creativity, so the balance is tilted in favor of AI generated copy not being copyrightable.

The Supreme Court or legislation could change this, and I'd guess there will be a movement to go in that direction, but till something like that succeeded it's not so.

lelanthran 4 hours ago||
> But recipes still are protected intellectual property by trade secret law if they are treated as a secret by the holder of the recipe.

Trade secrets aren't very well protected, though.

You can sue the person who leaked/stole your secret, but if others keep sharing it once it is leaked you can do nothing to them.

Culonavirus 7 hours ago||
> Software written by AIs are also not expressions of human creativity

I mean I'm not the biggest fan of AI on the planet by any means (which I think my post history would prove, lol), but isn't prompt design and steering the AI "human creativity"? In one of my AI-assisted projects I spent like a week in unending threads of posts trying to make the AI do stuff the way I wanted, testing the output, finding a bazillion of bugs and "basic bitch" solutions, asking for more robust this and edge case that. It felt like I wrote a novel. How is that not creativity (Crayon-eater or Picasso, creativity is creativity)?

comonoid 3 hours ago||
I wonder when my manager "prompts" me "I want the feature X and I want it fast", is his prompt a human creativity?
bko 23 hours ago||
This is all well and good as an intellectual exercise, but in real life none of this matters. Almost no one thinks their code is copyrightable or seriously thinks their code is a moat. I've written the same chunks of code for a number of employers as has every engineer. We've all taken chunks from stack overflow and other places without carefully considering attribution.

This comes up in a few places as a kind of vindictive battle. One example is Oracle suing Google for too closely mimicking their API in Android. Here is an example:

> private static void rangeCheck(int arrayLen, int fromIndex, int toIndex) {

    if (fromIndex > toIndex)

        throw new IllegalArgumentException("fromIndex(" + 
fromIndex +

                                           ") > toIndex(" + 
toIndex + ")");

    if (fromIndex < 0)

        throw new ArrayIndexOutOfBoundsException(fromIndex);

    if (toIndex > arrayLen)

        throw new ArrayIndexOutOfBoundsException(toIndex);
}

And it was deemed fair use by the Supreme Court. Other times high frequency hedge funds sued exiting employees, sometimes successfully. In America, anyone can sue you for any reason, so sure, you'll have Ellison take a feud up with Page and Brin all the way up to the Supreme Court.

In 99.9% of instances none of this matter. Sure there's the technical letter of the law but in practice, and especially now, none of this matters.

https://www.supremecourt.gov/opinions/20pdf/18-956_d18f.pdf

freedomben 23 hours ago||
> Almost no one thinks their code is copyrightable or seriously thinks their code is a moat.

You'd be surprised! Among non-software management types, they often think of the code as extremely valuable IP and a trade secret. I'm a CTO and I've made comments before to non/less technical peers about how the code (generally speaking) isn't that big of a secret, and I routinely get shocked expressions. In one case the company almost passed on a big contract because it required disclosure of the source code (with an NDA). When I told them that was a silly reason and explained why, they got it, but the old way of thinking still permeates and is a hard habit to break.

Edit: Fixed errant copy pasta error. Glad that wasn't a password :-)

mbesto 15 hours ago|||
Totally agreed.

I work in M&A. Nearly every lawyer, accountant, investor, and software business owner thinks their code is solely valuable and a trade secret. I find it hilarious and try to be as diplomatic as possible about why it's not. They also willfully will give their client list to a potential acquirer but get super cagey they moment a third party provider asks for their code to be scanned.

This argument easily gets shut down when I asked why, Twitch, a $1B business didn't crater to their competition when their full codebase was leaked.

bko 23 hours ago||||
You're right, I guess maybe I mean in any serious actionable way. Senior, non technical people leave plenty of money on the table by thinking they're protecting something valuable or they have some kind of secret sauce. It's all silly is what I meant to say, and digging into the technicalities of whether your code is truly copyrightable is kind of pointless. It's all vibes.
senaevren 22 hours ago||
The place where it concretely matters is M&A due diligence. Acquirers are now routinely asking about AI tool usage in development and running license scans as a condition of closing. A codebase that cannot demonstrate human authorship over its core IP, or that contains GPL contamination, creates a representation and warranty problem in the purchase agreement. For most companies day to day you are right. For the companies that get acquired or raise institutional capital, the question becomes very concrete very quickly.
freedomben 17 hours ago||
Very interesting, I had no idea. That's probably going to be a very painful lesson learned by all the startups that have been pumping out AI code. I know of several just among my peer groups that will be shocked and dismayed by this. Thanks for sharing that!
senaevren 16 hours ago||
That is exactly the gap the piece is aimed at. The M&A conversation is where this becomes concrete very fast, and most founders shipping AI-assisted code have not had it yet.
mbesto 15 hours ago||
Eh, it does and it doesn't. PE investors actively are asking why more of the portfolio companies aren't generating codebases using Claude Code. You are right that lawyers are asking about code generated by LLMs but this is more of a CYA out of ignorance more than anything else (btw - many purchase agreements have funny representations like "your code is free of bugs" which is downright hilarious).

So these two things are squarely at odds with eachother...meaning, I don't know any PE acquirers who are actively terminating deals because the target acquisition's code is generating by an LLM even if the lawyers try to get a rep about it in the purchase agreement.

For the record, I still have yet to have an M&A lawyer explain to me unilaterally that AI generated code is an infringement...hence the question "who owns the code Claude Code writes" is still open.

senaevren 15 hours ago||
The tension you are describing is real and the piece does not capture it well enough. PE acquirers pushing portfolio companies toward Claude Code while their lawyers are adding AI code reps to purchase agreements is exactly the gap that will produce the first painful deal. The rep usually survives unsigned because neither side has done the analysis. When the first deal falls apart or a rep is breached post-close because of GPL contamination in an AI-assisted codebase, that will set the market standard faster than any court ruling.
mbesto 15 hours ago||
> When the first deal falls apart or a rep is breached post-close because of GPL contamination in an AI-assisted codebase, that will set the market standard faster than any court ruling.

Assuming it ever does...first, GPL is hardly enforced and second, I feel like there is going to be enough money (e.g. Anthropic's own code it uses for the harness) that pushes back against it being problematic. We'll see.

hackingonempty 22 hours ago|||
Maybe LLM coding agents change the equation by making it much easier to adapt and use foreign and probably incomplete code. Getting you closer to competing with the original authors in a shorter amount of time than generating new code from scratch.
sarchertech 22 hours ago|||
> Almost no one thinks their code is copyrightable

Every open source license is built on the premise that code is copyrightable.

adrian_b 17 hours ago||
No.

It is based on the premise that if the proprietary licenses are valid, then also the open source licenses are valid.

So what is held as true is only the implication stated above and not the truth value of the claims that either kind of licenses are valid.

If the proprietary licenses are not valid, then it does not matter that also the open source licenses are not valid.

The open source licenses are intended as defenses against the people who would otherwise attempt to claim ownership of that code and apply a proprietary license to the code, i.e. exactly what now Anthropic and the like have done, together with their corporate customers.

Of course, if it is accepted that the code generated by an AI coding assistant is not copyrightable, then using it would not really be a violation of the original open source licenses. The problem is that even if this principle is the one accepted legally, at least for now, both Anthropic and their corporate customers appear to assume that they own the copyright for this code that should have been either non-copyrightable or governed by the original licenses of the code used for training.

sarchertech 15 hours ago||
Yes.

“ Copyright <YEAR> <COPYRIGHT HOLDER>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.”

The copyright assertion is the very first line of the MIT license, and the right to copy the code is granted. Clearly a reasonable person would affirm that that license (and all similar licenses) are based on a premise that code can be copyrighted.

> It is based on the premise that if the proprietary licenses are valid, then also the open source licenses are valid.

>If the proprietary licenses are not valid, then it does not matter that also the open source licenses are not valid.

That’s not true. Imagine a world where proprietary licenses are made invalid.

In such a world a company could take open source code compile it and distribute it (or build a SaaS) without the source code.

Even if you only focus on licenses that don’t prohibit this, most of those licenses require attribution.

So even in a world where propriety licenses were invalid the majority of open source licenses would still have a purpose.

You’re attempting to split hairs to argue on a very subtle technicality, but you’re not even technically right.

pocksuppet 14 hours ago||
MIT just disclaims all the author's rights except attribution. If it turns out the code isn't copyrightable, nothing really changes. A better example would be GPL.
sarchertech 10 hours ago||
I mentioned that in my comment, but attribution is a big deal.
conartist6 23 hours ago|||
Nobody ever talks about convergence.

You, right now, are taking about convergence.

If there is no artwork, there can be no copyright. If every character of the code to write is basically predetermined by the APIs you need to call, there is no artwork and no copyright.

Build a novel new API, and you'll be protected though.

croes 23 hours ago|||
> Almost no one thinks their code is copyrightable

Then why does reverse engineered code need to be a clean room implementation?

Ask any emulator developer or the developers of ReactOS

https://reactos.org/forum/viewtopic.php?t=21740

Nursie 23 hours ago|||
> Almost no one thinks their code is copyrightable

I think this is an unusual opinion.

Code may not be copyrightable in as small chunks as you put there, but in terms of larger pieces I think companies and individuals very often labour under the belief that code is intellectual property under copyright law.

If code isn't copyrightable, from where comes the GPL?

And why does anyone care if (for instance) some Microsoft code might have accidentally ended up in ReactOS, causing that project to need to go into a locked-down review mode for months or years? For that matter why do employers assert that they own the copyright in contracts?

I think it's the opposite - almost everyone thinks their code is copyrightable, outside of APIs and interop stuff, or things so simple as to be trivial.

Rietty 23 hours ago||
Why were the HFT firms suing employees?
_flux 23 hours ago||
I think it should be pretty clear that if you provided the tool the specification for the code you want, you have already provided creative input.

After all, is this not what happens with compilers as well? LLM agents are just quite advanced compilers that don't require the specification to be as detailed as with traditional compilers.

yodon 23 hours ago||
>it should be pretty clear that if you provided the tool the specification for the code you want, you have already provided creative input.

If you provided a human contractor with the specifications for the code you want, the courts have repeatedly made clear you have not provided the creative input from a copyright perspective, and the contractor needs to explicitly assign those rights to you if want to own the copyright on the code.

_flux 20 hours ago|||
Let's say we didn't have assemblers, but instead we would have three professions:

- Specifiers, who make the specification for the system

- Programmers, who write C code

- Machine encoders, that take that C code and write machine code for a CPU

Would it be that the copyright would then belong to programmers, if no other explicit assignments would be made?

---

Thinking about it, probably yes: copyright of the spec belongs to specifies, copyright of the C belong to programmers, and copyright of machine code to machine encoders. Or would it depend on the amount of optimizations the machine encoders would do, i.e. is it creative or not? And then does this relate to the task and copyrightability of C compiler output, where optimizations can sometimes surprise the developer?

anikom15 15 hours ago|||
LLMs aren’t human.
everforward 16 hours ago|||
Specifications are not necessarily creative input. Eg if I write a prompt that just says “write a rate limiter in Python”, there’s really no creative input. I didn’t decide on the API, or the algorithm to bucket requests, or where to store counters, or etc. I just gave it statements of fact, which are inherently not creative.

Compilers are different in that the resulting binaries are not separately copyrighted. They are the same object to the Copyright Office because one produces the other, in the same way that converting an image to a PDF is still the same copyright.

LLMs don’t do that. The stuff coming in may not be copyrighted, and may not be copyrightable. The stuff that comes out is not a rote series of transformations, there are decisions being made. In common use, running a prompt 10 times might yield 10 meaningfully different results.

I’m dubious the outcome will be “any level of prompting is enough creativity”.

d0100 7 hours ago|||
The trick is to constrain the LLM to program in a very defined coding style

If I make the LLM generate code that follows my own code architecture and style, that should be enough creative input

pocksuppet 14 hours ago|||
Fine then that's not copyrightable at all. Just like hello world isn't copyrightable, whether in source form or compiled form.
senaevren 23 hours ago|||
The compiler analogy is the right one to reach for and the Copyright Office addressed it directly: the question is not whether you provided input, it is whether the creative expression in the output reflects human authorship. With a traditional compiler, the programmer authors every expression in the source. With an LLM, the programmer authors the intent and the model makes the expressive decisions about structure, naming, pattern, and implementation. Whether that distinction matters legally is what Allen v. Perlmutter is working through right now. The summary judgment briefing completed in early 2026 and it may be the next landmark ruling on exactly this question.
hypercube33 23 hours ago|||
To me this is like asking who owns the binary files a compiler generates.
kk_mors 19 hours ago||
[dead]
dash2 1 hour ago||
I wrote an R library doing some simple regressions using the GPU, with Claude. I asked it to provide the same API as lm, glm and some other base R functions. It copied their code wholesale without mentioning it to me. So, now my library is GPL… which is not a big deal in this context, but it was quite a shock.
gorgoiler 3 hours ago|
Three things matter when it comes to eating my breakfast sandwich:

1/ Was the pork in my sausage reared on a farm that meets agricultural standards?

2/ Was the food handled safely by the kitchen that cooked my food?

3/ Does the owner of the diner pay kitchen wages in accordance with labor law?

By contrast, I have no idea what went into the models I use, what system prompts have prejudiced it, and whose IP has been exploited in pursuit of my answer.

That’s being charitable, really. In practice the open secret of the AI industry is that the vast majority of training data, for want of a better word even if it is likely to be the most precise description, is stolen data.

amelius 3 hours ago||
Probably, yes, but the burden of proof is with us not them.

I'm already glad some companies have the guts to open their models because proving it for open models is probably a lot easier than for a model behind a service.

wartywhoa23 2 hours ago|||
The proof is the $stupid-billion infrastructure built and kept up to host mousetraps armed with free cheese made of virtue signalling about doing the right thing and sharing the code with the world for free.
tngranados 2 hours ago|||
That's a matter of changing a law, it's all up to the people and their representatives. We talk as if everything is set on stone but if there really is a will, there is a way.
ap99 2 hours ago|||
What's an example of data that might have been stolen?
devsda 3 hours ago||
The media industry loves to quote ridiculous numbers on lost revenue due to piracy etc. May be a rough ballpark numbers will get them to do something about this theft.

Can someone put a rough estimate on potential revenue loss (direct and incidental) from training AI with industry wise breakup.

gorgoiler 2 hours ago||
It’s wrong to stop progress. I just want to know what data went into my model and have access to the same data. The same way we have national libraries of books but with the caveat that I don’t really know how one is supposed to browse petabytes of OpenAI .zips like I browse old books.

If the data is proprietary (eg Meta’s stash of FB comments) then I am satisfied to be told it’s private and I can’t see it. If, however, the works were public then give me a URL if it’s live or a cached copy if it isn’t.

More comments...