Define policy forbidding use of AI code generators

Posted by todsacerdoti 6/25/2025

Define policy forbidding use of AI code generators(github.com)

551 points | 413 commentspage 3

randomNumber7 6/26/2025|

I know a secret. You can read the code the AI generated for you and check if it is what you want to do. It is still faster than writing it yourself most of the time.

JonChesterfield 6/27/2025|

Like skimming through a maths textbook right? Way quicker than writing one, same reassuring sense of understanding.

randomNumber7 6/27/2025||

Just that code and math textbooks are s.th. different. I already understand how to code, I just need to do s.th. useful with it.

mattl 6/26/2025||

I'm interested to see how this plays out. I'd like a similar policy for my projects, but also a similar policy/T&C that prohibits the crawling of the content too.

candiddevmike 6/26/2025|

Only way to prohibit crawling is to go back to invite only, probably self-hosted repositories. These companies have no shame, your T&Cs won't mean anything to them and you have no way of proving they violated them without some kind of discovery into their training data.

N1H1L 6/26/2025||

I use LLMs for generating documentation- I write my code, and ask Claude to write my documentation

auggierose 6/26/2025|

I think you are doing it the wrong way around.

insane_dreamer 6/26/2025||

Maybe not. I trust Claude to write docs. I don’t trust it to write my code the way I want.

ludicrousdispla 6/26/2025||

>> The tools will mature, and we can expect some to become safely usable in free software projects.

It should be possible to build a useful AI code generator for a given programming language solely from the source code for the language itself. Doing so however would require some maturity.

UrineSqueegee 6/26/2025||

if AI using books to train isn't copyright infringement then the outputted code isn't copyrighted material either

naveed125 6/26/2025||

Coolest thing I've seen today.

BurningFrog 6/26/2025||

Would it make sense to include the complete prompt that generated the code with the code?

catlifeonmars 6/26/2025||

You’d need to hash the model weights and save the seeds for the temperature prng as well, in order to verify the provenance. Ideally it would be reproducible, right?

danielbln 6/26/2025||

Maybe 2 years ago. Nowadays LLMs call functions and use tools, good luck capturing that in a way that it's reproducible.

astrobiased 6/26/2025|||

It would need to be more than that. A prompt for one model can have different results vs another. Even when the model has different treatment for inference, eg quantization, the same prompt for the unquantized and quantized model could differ.

verdverm 6/26/2025||

Even more so, when you come back to understand in a few years, the model will no longer be available

galangalalgol 6/26/2025||

One of several reasons to use an open model even if it isn't quite as good. Version control the models and commit the prompts with the model name and a hash of the parameters. I'm not really sure what value that reproducibility adds though.

ethan_smith 6/26/2025||

Including prompts would create transparency but still wouldn't resolve the underlying copyright uncertainty of the output or guarantee the code wasn't trained on incompatibly-licensed material.

randomNumber7 6/26/2025||

I mean for low level C code the current LLMs are not that helpful anyway.

On the other hand I am 100% sure that every company that doesn't use LLMs will be out of business in 10 years.

curious_cat_163 6/26/2025||

That’s very conservative.

wlkr 6/26/2025|

More comments...