Is legal the same as legitimate: AI reimplementation and the erosion of copyleft

Posted by dahlia 8 hours ago

Is legal the same as legitimate: AI reimplementation and the erosion of copyleft(writings.hongminhee.org)

236 points | 237 commentspage 4

martin-t 1 hour ago|

1) Legality and morality are obviously different and unrelated concepts. More people should understand that.

2) Copyright was the wrong mechanism to use for code from the start, LLMs just exposed the issue. The thing to protect shouldn't be creativity, it should be human work - any kind of work.

The hard part of programming isn't creativity, it's making correct decisions. It's getting the information you need to make them. Figuring out and understanding the problem you're trying to solve, whether it's a complex mathematical problem or a customer's need. And then evaluating solutions until you find the right one. (One constrains being how much time you can spend on it.)

All that work is incredibly valuable but once the solution exists, it's each easier to copy without replicating or even understanding the thought process which led to it. But that thought process took time and effort.

The person who did the work deserved credit and compensation.

And he deserves it transitively, if his work is used to build other works - proportional to his contribution. The hard part is quantifying it, of course. But a lot of people these days benefit from throwing their hands up and saying we can't quantify it exactly so let's make it finders keepers. That's exploitation.

3) Both LLM training and inference are derivative works by any reasonable meaning of those words. If LLMs are not derivative works of the training data then why is so much training data needed? Why don't they just build AI from scratch? Because they can't. They just claim they found a legal loophole to exploit other people's work without consent.

I am still hoping the legal people take time to understand how LLMs work, how other algorithms, such as synonym replacement or c2rust work, decide that calling it "AI" doesn't magically remove copyright and the huge AI companies will be forced to destroy their existing models and train new ones which respect the licenses.

svilen_dobrev 3 hours ago||

i've been following this for a while.. and the trend for copyright (of any form - books code pictures music whatever) being laundered by reinventing the "same" thing in-some-way.. is kind-of clear.

But what happens with the new things? Has the era of software-making (or creating things at large) finished, and from now on everything will be re-(gurgitated|implemented|polished) old stuff?

Or all goes back to proprietary everything.. Babylon-tower style, noone talks to noone?

edit: another view - is open-source from now on only for resume-building? "see-what-i've-built" style

t43562 6 hours ago||

Why does anyone need his new library? They can do what he did and make their own.

I'm glad we can fork things at a point and thumb our noses at those who wish to cash in on other's work.

warkdarrior 5 hours ago|

Why would I make my own? The new library is released under MIT license and faster than the old one.

t43562 5 hours ago||

If you decide to improve it in any way to fit your needs you can merely tell your own AI to re-implement it with your changes. Then it's proprietary to you.

strongpigeon 6 hours ago||

I feel like the licenses that suffer the most isn't the GPL, but the ones like SSPL. If your code can be re-implemented easily and legally by AWS using an LLM, why risk publishing it?

It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.

mh2266 4 hours ago||

Buried in here: Mark Pilgrim suddenly reappearing after his sudden disappearance years ago! Has he been up to anything since then?

sayrer 6 hours ago||

I don't think this part is correct: "If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms."

That's what something like AGPL does.

dwroberts 6 hours ago||

One of the things that irks me about this whole thing is, if it’s so clean room and distinct, why make the changes to the existing project? Why not make an entirely new library?

The answer to that, I think, is that the authors wanted to squat an existing successful project and gain a platform from it. Hence we have news cycle discussing it.

Nobody cares about a new library using AI, but squash an existing one with this stuff, and you get attention. It’s the reputation, the GitHub stars, whatever

nicole_express 6 hours ago||

I mean, Blanchard was the longtime maintainer of chardet already, and had wanted to relicense it for years. So I think that complicates your picture of "squatting an existing successful project".

Honestly it's a weird test case for this sort of thing. I don't think you'd see an equivalent in most open source projects.

intrasight 6 hours ago||

I agree. But you can't copyright goodwill and reputation. Trademark does provide some protection there, right?

mbgerring 2 hours ago||

See also "A Declaration of the Independence of Cyberspace" (https://www.eff.org/cyberspace-independence), and what a goofy, naive, misguided disaster that early internet optimism turned into.

No, AI does not mean the end of either copyright or copyleft, it means that the laws need to catch up. And they should, and they will.

mwkaufma 5 hours ago|

A lot of untagged IANAL takes here today.

More comments...