Is legal the same as legitimate: AI reimplementation and the erosion of copyleft

Posted by dahlia 14 hours ago

Is legal the same as legitimate: AI reimplementation and the erosion of copyleft(writings.hongminhee.org)

381 points | 414 commentspage 8

martin-t 7 hours ago|

1) Legality and morality are obviously different and unrelated concepts. More people should understand that.

2) Copyright was the wrong mechanism to use for code from the start, LLMs just exposed the issue. The thing to protect shouldn't be creativity, it should be human work - any kind of work.

The hard part of programming isn't creativity, it's making correct decisions. It's getting the information you need to make them. Figuring out and understanding the problem you're trying to solve, whether it's a complex mathematical problem or a customer's need. And then evaluating solutions until you find the right one. (One constrains being how much time you can spend on it.)

All that work is incredibly valuable but once the solution exists, it's each easier to copy without replicating or even understanding the thought process which led to it. But that thought process took time and effort.

The person who did the work deserved credit and compensation.

And he deserves it transitively, if his work is used to build other works - proportional to his contribution. The hard part is quantifying it, of course. But a lot of people these days benefit from throwing their hands up and saying we can't quantify it exactly so let's make it finders keepers. That's exploitation.

3) Both LLM training and inference are derivative works by any reasonable meaning of those words. If LLMs are not derivative works of the training data then why is so much training data needed? Why don't they just build AI from scratch? Because they can't. They just claim they found a legal loophole to exploit other people's work without consent.

I am still hoping the legal people take time to understand how LLMs work, how other algorithms, such as synonym replacement or c2rust work, decide that calling it "AI" doesn't magically remove copyright and the huge AI companies will be forced to destroy their existing models and train new ones which respect the licenses.

wvenable 5 hours ago|

> If LLMs are not derivative works of the training data then why is so much training data needed?

If you went to school for 12-16 years, that's a lot of training. Does that mean anything you produce is a derivative work?

animitronix 7 hours ago||

LPGL is dead, long live the AI rewrites of your barely open source code

iberator 8 hours ago||

Easy solution for now:

Add something like this to NEW gpl /bsd/mit licenses:

'you are forbidden from reimplementing it with AI'

or just:

'all clones, reimpletetions with ai etc must still be GPL'

moralestapia 9 hours ago||

That's a non-sequitur. chardet v7 is GPL-derived work (currently in clear violation of the GPL). If xe wanted it to be a different thing xe should've published as such. Simple as.

casey2 11 hours ago||

If the model wasn't trained on copyleft, if he didn't use a copyleft test suite and if he wasn't the maintainer for years. Clearly the intent here is copyright infringement.

If you have software your testsuite should be your testsuite, you do dev with a testsuite and then mit without releasing one. Depending on the test-suite it may break clean room rules, especially for ttd codebases.

righthand 11 hours ago||

I think what is happening is the collapse of the “greater good”. Open source is dependent upon providing information for the greater good and general benefit of its readers. However now that no one is reading anything, its purpose is for the great good of the most clever or most convincing or richest harvester.

delichon 12 hours ago||

Imagine if the author has his way, and when we have AI write software, it becomes legally under the license of some other sufficiently similar piece of software. Which may or may not be proprietary. "I see you have generated a todo app very similar to Todoist. So they now own it." That does not seem like a good path either for open source software or for opening up the benefits of AI generated software.

moi2388 12 hours ago||

Perhaps we should finally admit that copyright has always been nonsense, and abolish this ridiculous measure once and for all

vladms 12 hours ago||

Probably a wiser approach is to consider different times require different measures (in general!).

I did not study in detail if copyright "has always been nonsense", but I do agree that nowadays some of the copyright regulations are nonsense (for example the very long duration of life + 70 years)

intrasight 11 hours ago|||

I think AI is very much eroding the legitimacy of copyright - at least to software, which is long been questioned since it's more like math than creative expression.

I think the industry will realize that it made a huge mistake by leaning on copyright for protection rather than on patents.

joshmoody24 9 hours ago|||

IMO the core idea of copyright isn't nonsense, but I do think the current implementation (70+ years after death) is egregiously overpowered. I've always thought the current laws were too deeply entrenched to ever change, but I'm tentatively optimistic AI will shock the system hard enough to trigger actual reform.

mbgerring 8 hours ago||

Actually I think the last 20 years of the Internet demonstrates that copyright is more important than ever, because unless it's enforced, people with more capital than the copyright owner will simply steal creative works and profit from them.

The idea that "information wants to be free" was always a lie, meant to transfer value from creators to platform owners. The result of that has been disastrous, and it's long past time to push the pendulum in the other direction.

throawayonthe 12 hours ago|

shall we now have to think about the tradeoffs in adopting

- proprietary

- free

- slop-licensed

software?

megous 7 hours ago|

We should just use LLMs to free more software and HW. Make it work against the system.

More comments...