Posted by dahlia 5 hours ago
This whole article is just complaining that other people didn't have the discussion he wanted.
Ronacher even acknowledged that it's a different discussion, and not one they were trying to have at the moment.
If you want to have it, have it. Don't blast others for not having it for you.
> But law only says what conduct it will not prevent—it does not certify that conduct as right. Aggressive tax minimization that never crosses into illegality may still be widely regarded as antisocial. A pharmaceutical company that legally acquires a patent on a long-generic drug and raises the price a hundredfold has not done something legal and therefore fine. Legality is a necessary condition; it is not a sufficient one.
It might even be morally abhorrent to have such a discussion in the first place!
Think about it; the license says that copies of the work must be reproduced with the copyright notice and licensing clauses intact. Why would anyone obey that, knowing it came from AI?
Countless instances of such licenses were ignored in the training data.
A lego sculpture is copyrighted. Lego blocks are not. The threshold between blocks and sculpture is not well-defined, but if an AI isn't prompted specifically to attempt to mimic an existing work, its output will be safely on the non-copyrighted side of things.
A derivative work is separately copyrightable, but redistribution needs permission from the original author too. Since that usually won't be granted or would be uneconomical, the derivative work can't usually be redistributed.
AI-produced material is inherently not copyrightable, but not because it's a derivative work.
There was an issue where Google did something similar with the JVM, and ultimately it came down to whether or not Oracle owned the copyright to the header files containing the API. It went all the way to the US supreme court, and they ruled in Google's favour; finding that the API wasn't the implementation, and that the amount of shared code was so minimal as to be irrelevant.
They didn't anticipate that in less than half a decade we'd have technology that could _rapidly_ reimplement software given a strong functional definition and contract enforcing test suite.
I like the article's point of legal vs. legitimate here, though; copyright is actually something of a strange animal to use to protect source code, it was just the most convenient pre-existing framework to shove it in.
which is the actual relevant part: they didn't do that dance AFIK
AI is a tool, they set it up to make a non-verbatim copy of a program.
Then they feed it the original software (AFIK).
Which makes it a side by side copy, as in the original source was used as reference to create the new program. Which tend to be seen as derived work even if very different.
IMHO They would have to:
1. create a specification of the software _without looking at the source code_, i.e. by behavior observation (and an interface description). I.e. you give the AI access to running the program, but not to looking into the insides of it. I really don't think they did it as even with AI it's a huge pain as you normally can't just brute force all combinations of inputs and instead need to have a scientific model=>test=>refine loop (which AI can do, but can take long and get stuck, so you want it human assisted, and the human can't have inside knowledge about the program).
2. then generate a new program from specification, And only from it. No git history, no original source code access, no program access, no shared AI state or anything like that.
Also for the extra mile of legal risk avoidance do both human assisted and use unrelated 3rd parties without inside knowledge for both steps.
While this does majorly cut cost of a clean room approach, it still isn't cost free. And still is a legal mine field if done by a single person, especially if they have enough familiarity to potentially remember specific peaces of code verbatim.
My understanding is they did do the dance. From the article: "He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch."
One could still make the argument that using the test suite was a critical contributing factor, but it is not a part of the resulting library. So in my uninformed opinion, it seems to me like the clean room argument does apply.
So my understanding was that the original code was specifically not fed into Claude. But was almost certainly part of its training data, which complicates things, but if that's fair use then it's not relevant? If training's not fair use and taints the output, then new-chardet is a derivative of a lot of things, not just old-chardet...
This is all new legal ground. I'm not sure if anyone will go to court over chardet, though, but something that's an actual money-maker or an FSF flagship project like readline, on the other hand, well that's a lot more likely.
This is an interesting reversal in itself. If you make the specification protected under copyright, then the whole practice of clean room implementations is invalid.
That’s just your subjective opinion which many other people would disagree. I bet Armin Ronacher would agree that an MIT licensed library is even freer than an LGPL licensed library. To them, the vector is running from free to freer.
I'm glad we can fork things at a point and thumb our noses at those who wish to cash in on other's work.
It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.