We stopped AI bot spam in our GitHub repo using Git's –author flag

Posted by ildari 17 hours ago

We stopped AI bot spam in our GitHub repo using Git's –author flag(archestra.ai)

468 points | 227 commentspage 2

hiccuphippo 17 hours ago|

The irony of the .ai domain.

nonethewiser 16 hours ago||

I dont think anything is ironic about it because they aren't suggesting AI is bad. Just that it can be misused.

wafflemaker 16 hours ago|||

Thanks for pointing it out. It has eluded me and it's incredibly funny

edfletcher_t137 13 hours ago|||

Not just the domain: it's an agentic stack! In other words, I could use their product to create the exact type of PRs they're lamenting here.

dbgrman 16 hours ago|||

also, could the website plz fix its scrolling code? its annoying. i can't read the article

motakuk 16 hours ago||

Would love to! Could you please share more? I can't quite see the issue

bakugo 15 hours ago||

"I never thought AI would slop my project!" Says company centered around AI slop

zer0tonin 17 hours ago||

> Should we stop giving fun test tasks to our job candidates?

Yes

FartyMcFarter 16 hours ago||

It seems this particular company makes a payment for completing those tasks, so it might not be that bad.

motakuk 14 hours ago||

We do, it's a part of our hiring pipeline: https://archestra.ai/careers

jbellis 15 hours ago|||

Developers: stop doing whiteboard interviews, they don't measure anything relevant to the real job

Also devs: stop giving us real world problems to solve

gabeio 5 hours ago||

Those are the only two options to finding quality candidates?

Try talking more about the meta of coding itself. Get into the developers head by _talking_ to them and understanding how they would approach and attack different problems. You can show them code and ask them what they would do differently / how they would go about implementing X-Y-Z. Just because you can write foobar doesn't mean you understand how to apply algorithms or w/e specific problems [your] team has. It's _far_ better to understand how they would solve a problem over their syntax anyway.

Chaosvex 17 hours ago|||

Yeah, fun for who exactly?

dymk 11 hours ago||

Me. That sounds way more fun than inverting a binary tree, and they pay candidates for their time.

jart 16 hours ago||

This is great example of the toxic effect money has on open source. Reward people with respect and recognition instead. Weird anonymous accounts no one's ever heard of will leave, because someone (or something) who's concealing their identity has nothing to gain from recognition. Honestly GitHub should have a real names policy. Because if you're not Satoshi Nakamoto then there's only three reasons I can think of to be anonymous on GitHub: (1) to avoid obtaining your employer's authorization, (2) to spam, harass, and engage in toxic behaviors, or (3) you're not even human. All three of these are the last things I want when engaging on the GitHub platform. Don't get me wrong, I love robots. But I'm perfectly capable of talking to the robot on my own. I don't want to talk to your robot. I also don't want people slipping me intellectual property below the board without their employer's consent. And I certainly don't enjoy all the hate and harassment. GitHub has tried to help with the last part, by making overt displays of hate something that can get you in trouble. The issue is that people just get more guilesome with more anonymous accounts, because the issue was never disrespect (which can actually be strategic and pro-social if we look at Torvalds' career), but rather bad faith participation. If GitHub can guarantee that all its users are human real names good faith actors, then we might be able to start talking about open bounties.

pabs3 5 hours ago|

> someone (or something) who's concealing their identity has nothing to gain from recognition

The xz supply chain attacker hid their real identity, created fakes one and gained recognition over time in order to gain more access and add the backdoor. So TLAs and other bad actors at least are interested in gaining recognition.

jart 5 hours ago||

I know, right? It's like, finally—a threat actor who's intelligent enough to understand what capital means in the open source community and is willing to devote resources to engage with it authentically (even if it's for evil nefarious ends). The xz incident showed that the open source community has many other good defense mechanisms for verifying and spotting malicious work and then solving it. But we won't even get to play that game if we're inundated with anonymous agent spam so that GitHub can juice its MAU numbers. Maybe they should require every account buy a $40 yubikey. I don't know what the answer is. But I know that no one gains when your measure of success is driving the cost of burning open source developers out down to literally zero.

pabs3 3 hours ago|||

The xz incident was only discovered by accident, not by someone actually verifying the tarball and test cases were not malicious. We still don't have verification of tarball build reproducibility anywhere. The closest you can get to verified builds is what the bootstrappable builds community built in hex0/stage0, and what stagex built on top of that. I'm guessing even they haven't read through all that source code though. There aren't even good tools for distributing reviews, there is crev, but the stagex folks think it has some deficiencies.

https://news.ycombinator.com/item?id=47701394

pabs3 3 hours ago|||

I don't know what the solution to slop is. Maybe the bubble will implode at some point. Until then, just close down issues/pulls or remove projects from GitHub I guess.

embedding-shape 16 hours ago||

Sounds kind of weird that the blog post complains about `poisoning the conversation with pointless "implementation plans"` when literally they ask for that, after attaching $900 USD bounty to a very under-specified issue, and even replies with "Do you have an implementation plan in mind?" to some of the first "attempters". Sounds like they got exactly what they'd been asking for, and even before LLMs if you pulled something similar, the effects would have been similar.

bradley13 15 hours ago|

It's fine for developers to provide a plan, even if it gets rejected. The problem comes when every script kiddie figures AI has made them into a developer.

Imagine you want to get a doctor's opinion, or maybe a couple of opinions. But a zillion AI-amateurs have registered themselves as doctors. How do you separate wheat from the chaff?

embedding-shape 15 hours ago||

> Imagine you want to get a doctor's opinion, or maybe a couple of opinions. But a zillion AI-amateurs have registered themselves as doctors. How do you separate wheat from the chaff?

Right, but that's not what happened though.

Someone went to the public square, said "Hey, I'm looking for any sort of doctor, and I'll pay you $900 if you tell me your plan and then whatever plan I chose wins" and then they get surprised they get flooded by zillion AI-amateurs.

You don't generate a ton of chaff then try to find the wheat, you ensure your process doesn't generate a ton of chaff in the first place. Offering large monetary rewards for relatively simple work for anyone in the public is bound to generate a ton of chaff...

nubinetwork 14 hours ago||

While git has always allowed this, I don't really like the idea that someone can write some code, slap my name on it, and push it to their repo.

delduca 14 hours ago||

You can sign your commits with OpenPGP.

codazoda 14 hours ago|||

I think this is why signed commits are also supported. My first thought was that this probably doesn’t work with signed commits. But, maybe it does since they are listed as the commiter.

mschuster91 14 hours ago||

Yup. At the very least, the "big dogs" aka Github and Gitlab should allow you to "claim" an email address to an account and only link it up when the commit in question either directly got authored from the web UI or got cryptographically signed.

foresto 11 hours ago||

> If the email matches their GitHub account, GitHub links the commit to their profile and grants them contributor status.

When the article mentioned email matching, I was concerned that it would break down when a contributor's email address changes. (I have contributed to more than a few projects over the years, using email addresses that no longer exist.)

However, it looks like they're not using the email address recorded in the author's original git commit, but instead a GitHub-generated address whose unique parts are the GitHub user ID and username. That should survive authors changing their email addresses. It would still break down if a contributor loses access to their account and has to create a new one, but that's probably less common.

ildari 17 hours ago||

Hi HN community, I wanted to share our approach to reduce amount of AI slop PR's and issues in our repo. We enabled "require prior contribution" flag on GH and created a CI script that creates a tiny commit co-authored with you, if you pass captcha on our website. Worked really well and we were able to block at least 500 bots in the first week. Sharing a screenshot from cloudflare: https://archestra.ai/hn-comment-cloudflare-challenge-outcome...

satvikpendem 17 hours ago||

Yep, this is similar to some other version control tools like Tangled which has vouching.

https://blog.tangled.org/vouching/

halapro 14 hours ago|||

Who do you add as a contributor though? Wannabe-contributors? Then they appear in the list of contributors before you even see if they're capable of producing an acceptable PR.

Your solution would be great if GitHub would also allow me to whitelist specific users, but unfortunately this still won't block "implementation plans" in comments.

tln 17 hours ago||

Thats a really elegant solution.

How does the website trigger the CI script? Through GH rest API?

ildari 17 hours ago||

thank you, yep through the rest API, here is the example: https://github.com/archestra-ai/website/blob/29ebdacbd8a22b9...

_joel 17 hours ago||

Woudln't it be trivial to farm the stats needed to pass the bot checker's theshold?

aizk 16 hours ago||

I'm not sure why gh hasn't already implemented stricter measures / filters / tools for PRs. It would cut down on spam and also help save their servers that can't handle the increased AI load!

jagged-chisel 15 hours ago|

Repos get forked, code gets pushed, all before a PR is created. What kind of measures can be implemented to cut down on the AI-general forks and pushes?

halapro 14 hours ago||

You can fork and push all you want. The problem is specifically when you show up in my notifications with your junk PR.

jagged-chisel 10 hours ago||

The issue for GH isn’t your PR spam. It’s all the other operations before your PR spam ever arrives.

antran22 4 hours ago|

At this point we should be convinced that it's in Github and Macro$lop's narrative to encourage fully automated, LLM-assisted PR bombing, because "muh future of development" and what not. If they do care about combatting spam, they would have already:

- Protect the PR submitting feature behind some CAPTCHA

- Give repo owners some way to manage external contributors, instead of forcing them to do hack like this article

Just move to Codeberg, src.hut, or Gitlab even. Serious contributors will go there with you, the lazy people with LLM farming Github karma probably won't.

More comments...