We need a clearer framework for AI-assisted contributions to open source

Posted by keybits 1 day ago

We need a clearer framework for AI-assisted contributions to open source(samsaffron.com)

244 points | 129 commentspage 2

darkwater 1 day ago|

The title doesn't make justice to the content.

I really liked the paragraph about LLMs being "alien intelligence"

   > Many engineers I know fall into 2 camps, either the camp that find the new class of LLMs intelligent, groundbreaking and shockingly good. In the other camp are engineers that think of all LLM generated content as “the emperor’s new clothes”, the code they generate is “naked”, fundamentally flawed and poison.

   I like to think of the new systems as neither. I like to think about the new class of intelligence as “Alien Intelligence”. It is both shockingly good and shockingly terrible at the exact same time.

   Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to.

It's a similitude I find compelling. The way they produce code and the way you have to interact with them really feels "alien", and when you start humanizing them, you get emotions when interacting with it and that's not correct. I mean, I do get emotional and frustrated even when good old deterministic programs misbehaved and there was some bug to find and squash or work-around, but the LLM interactions can bring the game to a complete new level. So, we need to remember they are "alien".

andai 1 day ago||

Some movements expected alien intelligence to arrive in the early 2020s. They might have been on the mark after all ;)

reedlaw 1 day ago|||

Isn't the intelligence of every other person alien to ourselves? The article ends with a need to "protect our own engineering brands" but how is that communicated? I found this [https://meta.discourse.org/t/contributing-to-discourse-devel...] which seems woefully inadequate. In practice, conventions are communicated through existing code. Are human contributors capable of grasping an "engineering brand" by working on a few PRs?

darkwater 11 hours ago||

> Isn't the intelligence of every other person alien to ourselves?

If we agree that we are all humans and assume that all the other humans are conscious as one is, I think we can extrapolate that there is generic "human intelligence" concept. Even if it's pretty hard do nail it down, and even if there are several definitions of human intelligence out there.

For the other part of the comment, not too familiar with Discourse opensource approach but I guess that those rules are there mainly for employees, but since they develop in the open and public, they make them public as well.

reedlaw 6 hours ago||

My point was that AI-produced code is not so foreign than no human could produce it, nor do any two humans produce the same style of code. So I'm not sure exactly what the idea of "engineering brand" is meant to protect.

keiferski 1 day ago|||

This is why at a fundamental level, the concept of AGI doesn't make a lot of sense. You can't measure machine intelligence by comparing it to a human's. That doesn't mean machines can't be intelligent...but rather that the measuring stick cannot be an abstracted human being. It can only be the accumulation of specific tasks.

wat10000 1 day ago||

I’m reminded of Dijkstra: “The question of whether machines can think is about as relevant as the question of whether submarines can swim.”

These new submarines are a lot closer to human swimming than the old ones were, but they’re still very different.

prymitive 1 day ago||

The problem with AI isn’t new, it’s the same old problem with technology: computers don’t do what you want, only what you tell them. A lot of PRs can be judged by how well they are described and justified, it’s because the code itself isn’t that important, it’s the problem that you are solving with it that is. People are often great at defining problems, AIs less so IMHO. Partially because they simply have no understanding, partially because they over explain everything to a point where you just stop reading, and so you never get to the core of the problem. And even if you do there’s a good chance AI misunderstood the problem and the solution is wrong in a some more or less subtle way. This is further made worse by the sheer overconfidence of AI output, which quickly erodes any trust that they did understand the problem.

gordonhart 1 day ago||

> As engineers it is our role to properly label our changes.

I've found myself wanting line-level blame for LLMs. If my teammate committed something that was written directly by Claude Code, it's more useful to me to know that than to have the blame assigned to the human through the squash+merge PR process.

Ultimately somebody needs to be on the hook. But if my teammate doesn't understand it any better than I do, I'd rather that be explicit and avoid the dance of "you committed it, therefore you own it," which is better in principle than in practice IMO.

andrewflnr 7 hours ago|

If your teammate doesn't understand it, they shouldn't have committed it. This isn't a "dance", it's basic responsibility for your actions.

bloppe 1 day ago||

Maybe we need open source credit scores. PRs from talented engineers with proven track records of high quality contributions would be presumed good enough for review. Unknown, newer contributors could have a size limit on their PRs, with massive PRs rejected automatically.

mfenniak 1 day ago||

The Forgejo project has been gently trying to redirect new contributors into fixing bugs before trying to jump into the project to implement big features (https://codeberg.org/forgejo/discussions/issues/337). This allows a new contributor to get into the community, get used to working with the codebase, do something of clear value... but for the project a lot of it is about establishing reputation.

Will the contributor respond to code-review feedback? Will they follow-up on work? Will they work within the code-of-conduct and learn the contributor guidelines? All great things to figure out on small bugs, rather than after the contributor has done significant feature work.

selfhoster11 1 day ago||

We don't need more KYC, no.

javier123454321 1 day ago||

Reputation building is not kyc. It is actually the thing that enables anonymization to work in a more sophisticated way.

specproc 1 day ago||

A bit of a brutal title for what's a pretty constructive and reasonable article. I like the core: AI-produced contributions are prototypes, belong in branches, and require transparency and commitment as a path to being merged.

Lerc 1 day ago||

It is possible that some projects could benefit from triage volunteers?

There are plenty of open source projects where it is difficult to get up to speed with the intricacies of the architecture that limits the ability of talented coders to contribute on a small scale.

There might be merit in having a channel for AI contributions that casual helpers can assess to see if they pass a minimum threshold before passing on to a project maintainer to assess how the change works within the context of the overall architecture.

It would also be fascinating to see how good an AI would be at assessing the quality of a set of AI generated changes absent the instructions that generated them. They may not be able to clearly identify whether the change would work, but can they at least rank a collection of submissions to select the ones most worth looking at?

At the very least the pile of PRs count as data of things that people wanted to do, even if the code was completely unusable, placing it into a pile somewhere might be minable for the intentions of erstwhile contributors.

jcgrillo 1 day ago||

I guess the main question I'm left with after reading this is "what good is a prototype, then?" In a few of the companies I've worked at there was a quarterly or biannual ritual called "hack week" or "innovation week" or "hackathon" where engineers form small teams and try to bang out a pet project super fast. Sometimes these projects get management's attention, and get "promoted" to a product or feature. Having worked on a few of these "promoted" projects, to the last they were unmitigated disasters. See, "innovation" doesn't come from a single junior engineer's 2AM beer and pizza fueled fever dream. And when you make the mistake of believing otherwise, what seemed like some bright spark's clever little dream turns into a nightmare right quick. The best thing you can do with a prototype is delete it.

corytheboyd 1 day ago|

Completely agree, I hate the “hackathon” for so many reasons, guess I’ll vent here too. All of this from the perspective of one frustrated software engineer in web tech.

First of all, if you want innovation, why are you forcing it into a single week? You very likely have smart people with very good ideas, but they’re held back by your number-driven bullshit. These orgs actively kill innovation by reducing talent to quantifiable rows of data.

A product hobbled together from shit prototype code very obviously stands out. It has various pages that don’t quite look/work the same, Cross-functional things that “work everywhere else” don’t in some parts.

It rewards only the people who make good presentations, or pick the “current hype thing” to work on. Occasionally something good that addresses real problems is at least mentioned but the hype thing will always win (if judged by your SLT)

Shame on you if the slop prototype is handed off to some other team than the hackathon presenters. Presenters take all the promotion points, then implementers have to sort out a bunch of bullshit code, very likely being told to just ship the prototype “it works you idiots, I saw it in the demo, just ship it.” Which is so incredibly short sighted.

I think the depressing truth is your executives know it’s all hobbled together bullshit, but that it will sell anyway, so why invest time making it actually good? They all have their golden parachutes, what do they care about the suckers stuck on-call for the house-of-cards they were forced to build, despite possessing the talent to make it stable? All this stupidity happens over and over again, not because it is wise, or even the best way to do this, the truth is just a flaccid “eh, it’ll work though, fuck it, let’s get paid.”

jcgrillo 1 day ago||

You touched on this but to expand on "numbers driven bullshit" a bit, it seems to me the biggest drag on true innovation is not quantifiability per se but instead how organizations react to e.g. having some quantifiable target. It leaves things like refactoring for maintainability or questioning whether a money-making product could be improved out of reach. I've seen it happen multiple times where these two forces conspire to arrive at the "eh, fuck it" place--like the code is a huge mess and difficult to work on, and the product is "fine" in that it's making revenue although customers constantly complain about it. So instead of building the thing customers actually want in a sustainable way we just... do nothing.

We have to do better than that before congratulating ourselves about all the wonderful "innovation".

andai 1 day ago||

>That said, there is a trend among many developers of banning AI. Some go so far as to say “AI not welcome here” find another project.

>This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.

Isn't that exactly the point? Doesn't this achieve exactly what the whole article is arguing for?

A hard "No AI" rule filters out all the slop, and all the actually good stuff (which may or may not have been made with AI) makes it in.

When the AI assisted code is indistinguishable from human code, that's mission accomplished, yeah?

Although I can see two counterarguments. First, it might just be Covert Slop. Slop that goes under the radar.

And second, there might be a lot of baby thrown out with that bathwater. Stuff that was made in conjunction with AI, contains a lot of "obviously AI", but a human did indeed put in the work to review it.

I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review? (And a proof of competence, to boot?)

felipeerias 1 day ago||

Personally, I would not contribute to a project that forced me to lie.

And from the point of view of the maintainers, it seems a terrible idea to set up rules with the expectation that they will be broken.

1gn15 8 hours ago|||

I know, right. It's like setting up rules saying "you can't use IDE autocomplete" or "you can't code with background music because that distracts you from bugs". If the final result is indistinguishable, I find it perfectly acceptable to lie. Rules are just words, after all, especially if it's completely unenforceable.

Or, the decentralized, no rulers solution: clone the repo on your own website and put your patches there instead.

danaris 1 day ago|||

...YYyyeah, that says a lot about you, and nothing about the project in question.

"Forced you to lie"?? Are you serious?

If the project says "no AI", and you insist on using AI, that's not "forcing you to lie"; that's you not respecting their rules and choosing to lie, rather than just go contribute to something else.

sgarland 1 day ago|||

> I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review?

In a live setting, you could ask the submitter to explain various parts of the code. Async, that doesn’t work, because presumably someone who used AI without disclosing that would do the same for the explanation.

zdragnar 1 day ago||

Based on interviews I've run, people who use AI heavily have no problem also using it during a live conversation to do their thinking for them there, too.

jrochkind1 1 day ago||

Well, but why not instead of asking/accepting people will lie undetectably when you say "No AI" and it's okay you're fine with lying, just say instead "Only AI when you spend the time to turn it into a real reviewed PR, which looks like X, Y, and Z", giving some actual tips on how to use AI acceptably. Which is what OP suggests.

insane_dreamer 16 hours ago||

related discussion: https://news.ycombinator.com/item?id=45330378

anal_reactor 1 day ago|

An idea occurred to me. What if:

1. Someone raises a PR

2. Entry-level maintainers skim through it and either reject or pass higher up

3. If the PR has sufficient quality, the PR gets reviewed by someone who actually has merge permissions

More comments...