We stopped AI bot spam in our GitHub repo using Git's –author flag

Posted by ildari 18 hours ago

We stopped AI bot spam in our GitHub repo using Git's –author flag(archestra.ai)

469 points | 228 commentspage 3

antran22 5 hours ago|

At this point we should be convinced that it's in Github and Macro$lop's narrative to encourage fully automated, LLM-assisted PR bombing, because "muh future of development" and what not. If they do care about combatting spam, they would have already:

- Protect the PR submitting feature behind some CAPTCHA

- Give repo owners some way to manage external contributors, instead of forcing them to do hack like this article

Just move to Codeberg, src.hut, or Gitlab even. Serious contributors will go there with you, the lazy people with LLM farming Github karma probably won't.

Muromec 17 hours ago||

How is the status revoked without rewriting git history?

ildari 17 hours ago|

we can block the user in github ui

bykhun 6 hours ago||

You should release this as a service.

rglullis 16 hours ago||

'I will take "problems that could be easily be solved by implementing a Pfand system" for $200, Alex.'

Seriously. Just ask for a US$10 deposit for the each PR. If the PR is accepted (not even merged, just accepted as "this is a good effort"), give it back. Hell, give double the amount for good effort and you got yourself a cheap way to attract good contributors.

Best case, bots will balk at the payment. Worst case, the funds can be used to hire someone specifically for triage.

godelski 16 hours ago||

This sounds like a great idea until you think about it for more than 30 seconds. Similar to most "it's so easy, you just" ideas.

Seriously, chill, then think about how you'd implement it. Then think how it'd go wrong. Then think about how to fix those problems. Repeat until you realize there's a better solution or until you solve the problem without making it overly convoluted. More often than not the former is the better option. More often the latter is just a variant of the sunk cost fallacy and your ego. Reality is (un)surprisingly complex and solutions aren't usually trivial

dimgl 16 hours ago||

This is an overly negative response to a genuine solution. There are a million reasons you shouldn't do X or Y.

More than likely GitHub would have to maintain their own internal wallet solution for this, which is a big engineering lift. But we're all just having a discussion.

godelski 15 hours ago||

  > to a genuine solution

Except it isn't. It is a lazy solution and impractical one

  > More than likely GitHub would have to maintain their own internal wallet solution

Great, so you even found one of the main issues, which pushes off the problem to a third party which makes it an impossible solution for anyone but GitHub (still a problematic "solution" though)

  > This is an overly negative response

Yet it isn't because even as you noted it's not realistic to implement.

There's two types of lazy, and this is the kind that creates more work, not less

igsomething 16 hours ago|||

Then people from a sanctioned countries are blocked from open source, or worse, you have to explain to the bank and/or the government why you sent 20USD to someone in Venezuela.

rglullis 16 hours ago||

And here we have another specimen of "things which crypto are actually useful" spotted in the wild!

LtWorf 14 hours ago||

I think the intersection of the set of people able and interested in contributing and those who are willing to figure out cryptocurrencies is the empty set.

rglullis 14 hours ago||

I quick visit to https://gitcoin.co/campaigns will show you that you are wrong. Hundreds of projects funded by even more people.

Mind you: that's on one of the most convulated ways there is to get involved, because it involves a bunch of smart contract operations and on-chain voting. If we are talking about crypto only as a payment network, things are even simpler.

LtWorf 13 hours ago||

It's cheating if the projects using it are cryptocurrency related :)

A generic python library used by generic people who have no interest in this field is something else.

rglullis 13 hours ago||

Coinbase reports more than 100+ user accounts worldwide. Kraken has ~10M.

Also, we are talking about people who are tech-savvy enough to be interested in participating in a FOSS project. Opening an account at an exchange is not rocket science.

LtWorf 12 hours ago||

No, getting the money out of an exchange if you are in a country USA doesn't like is rocket science. Which was the whole point of using crypto rather than money.

rglullis 10 hours ago||

The exchanges are for the people that can send the money, not receive it.

And those who are living in countries the USA doesn't like will probably have no issue to learn how to work with crypto. Of all the complex things they need to do to work around the restrictions, setting up a wallet barely registers.

kridsdale1 16 hours ago|||

This is exactly the strategy that the owner of the SomethingAwful forum used in 2004 to get rid of bots and assholes. (I used to remember his name, kinda famous, oh well).

hoistbypetard 16 hours ago|||

https://en.wikipedia.org/wiki/Richard_Kyanka

fullstop 16 hours ago|||

Lowtax, I think is what you're looking for.

applfanboysbgon 16 hours ago|||

This is an evergreen internet comment right here. Condescendingly proclaiming "This problem could be easily solved by [significantly worse solution that had 1/10th the thought put into it as the actual solution by people with a stake in actually solving the problem rather than making quippy armchair comments]".

---

I know it's against convention to comment on downvotes, but really? Really? This is controversial? The OP came up with an elegant solution that cleanly solved their problem without subjecting contributors to anything more than a captcha. Then somebody comes along and says "oh, it's so easy, just charge $10". You're going to set up payment infrastructure, incur administrative overhead with human support managing refunds, and deter 99% of actual humans from contributing, and then call that the easy solution that OP is so stupid for not thinking of first? Give me a fucking break. This site really is just Reddit-lite, anyone who thinks about engineering problems seriously would realise this does not stand up as anything beyond a pithy internet solution with three seconds of thought into what actually implementing it would entail.

rglullis 15 hours ago||

Github already has the payment infrastructure.

Polar.sh is already doing things that are a lot more complex in this space.

If you are in a civilized country which allow direct payments (i.e, anything but North American nowadays) and you don't want to deal with Github or any external system, there is always good old "make a M-PESA/SEPA/Pix/UPI transfer to account XYZ")

> the thought put into it as the actual solution by people with a stake in actually solving the problem

Let me flip your argument: think of how much time and thought is poured into problems like this one by people who don't even try to implement a Pfand system beforehand.

applfanboysbgon 15 hours ago||

> Github already has the payment infrastructure.

...which is not available to maintainers to use in this way.

> there is always good old "make a M-PESA/SEPA/Pix/UPI transfer to account XYZ"

And then lock out anyone who is not from the same country as the maintainer, on a platform that is known for its global reach.

Moreover, you're introducing significant anti-human friction. For privacy-conscious people, it's a complete non-starter; I'm not giving my payment information, not for a $1 transaction, and compromising my anonymity just to make a PR for the benefit of other people. That's a small subset. Then, you have the lazy people. The majority of the population will simply not bother with something if it has friction. Getting out their credit card is one of those things, and it's why products/services that offer free trials or a free tier tend to be overwhelmingly more successful -- people want to see a tangible benefit to themselves before they engage in high-friction processes (where "high-friction" is as little friction as requiring a payment, yes). "Free to play" video games with microtransactions engineer first-time purchases to be cheap ($1 or $5) and have 5x or 10x the value of the normal microtransactions, because that first hurdle of getting somebody to hand over their payment information is by far the biggest.

I'll take the captcha, thanks. And maintainers will too, because they'd rather have the solution that filters bots and keeps humans contributing rather than the one that filters out both humans and bots.

rglullis 14 hours ago||

> significant anti-human friction

Yes, that friction is intentional. The lazy people don't want to do it? Great, there is very little chance their contributions are worthwhile. The privacy conscious people won't do it? Then let them work on their own repositories and complain loudly about the idiot maintainer who puts these insane barriers. Then the maintainer can go take a look at that forks done by the loud complainers and see if it is worth to whitelist them.

> it's why products/services that offer free trials or a free tier tend to be overwhelmingly more successful

Drug dealers also offer the first hit for free, why don't you use that as an example as well? ;)

To answer this properly in case the quip was too vague: there is no reason for "number of PRs opened by new contributors" to be a viable/interesting KPI for any FOSS project.

> I'll take the captcha, thanks.

First you need to show me all your cool FOSS projects.

filleduchaos 11 hours ago|||

How is it "laziness" to not want to pay $10 to submit a bug fix to your repository?

rglullis 11 hours ago||

1) I just used the term lazy because that's what OP use.

2) You are not "paying" $10. The money would be returned to you. In case you haven't heard of Pfand systems: https://en.wikipedia.org/wiki/Container-deposit_legislation

halapro 15 hours ago|||

Possibly the worst idea I've heard this month.

No one, meat or chip, would just set aside $10 "for the opportunity to contribute"

This is "let them eat cake" level of out of touchness.

rglullis 13 hours ago||

I have 3 PRs on https://github.com/django-oauth/django-oauth-toolkit/pulls that haven't been merged for OVER AN YEAR due to the maintaners being overloaded and who are expected to work on this for free. The fact that these PRs are not being promptly reviewed have cost me at least 3000€ in potential grant work.

If I was told that I could make a deposit of $10 to get less stressed maintainers and a faster PR review cycle, I wouldn't even blink. I wouldn't even ask for the money back.

nijave 7 hours ago||

How did it cost you money in grant work? Can't you just fork and use that?

rglullis 2 hours ago||

Not exactly. NLNet was funding this to get all these different OAuth RFCs nicely integrated with DOT. A lot of my work was already done on separate packages, putting up just another fork without the blessing of the DOT core team would not be of much use to the overall community.

backwardsponcho 15 hours ago|||

Yeah, because we'd hate to allow people from poor countries to contribute to FOSS projects, right?

Or teenagers without full access to online banking.

Or the unemployed.

rglullis 13 hours ago||

Oh, give me a break. No one is taking the ability from others to fork the repo. If these exceptional cases really were to happen, how fast would it be for someone else to notice and do one of (a) notify the maintainers to get this particular user whitelisted or (b) front the entry costs?

backwardsponcho 11 hours ago||

Sounds like bandaids on top of bandaids, at which point you start to wonder if the idea is fundamentally broken.

rglullis 9 hours ago||

> Sounds like bandaids on top of bandaids

And a registration system that amounts to a more complicated captcha doesn't? How long until someone starts farming accounts and run bots that jump through these hoops as well?

> wonder if the idea is fundamentally broken.

It's only "fundamentally broken" if you need to build a perfect system that needs to accept 100% of legit PRs without raising any level of friction.

But we don't need that. Pfand systems are not meant to be perfect, and they are not meant to the single solution to any problem involving the commons. They will not get rid of all bad behavior, but they will certainly bring it a global-scale problem down to levels that can be then managed by other smaller, context-aware systems.

hoistbypetard 16 hours ago|||

I semi-regularly offer drive-by PRs to projects I like and use. They're real PRs, not generated with AI. They range from papercuts to doc fixes to attempts to add that one feature that I want. Sometimes it's a drive-by group of PRs. Or an issue and a PR. I try to conform to what the maintainers prefer.

Unless I knew the maintainers personally, this would prevent most of my contributions, which are most often accepted. Maybe it's worth losing out on my small contributions to avoid slop. But things would absolutely be lost this way.

smaudet 16 hours ago||

Agreed... However there's not a good IsThisAnAI() test at present. So unfortunately, we will have to use anti-spammer techniques (because that is exactly what AI is, high(er) quality spam).

bluGill 16 hours ago|||

Why should I trust you to give me my deposit back?

rglullis 16 hours ago||

Because the cost in reputation is not high enough to justify a large scale scam operation?

If you don't trust the maintainer, you can always fork a repo and let them merge on their own.

corps_and_code 15 hours ago|||

Interesting idea, I wonder about using it myself.

Let's say I'm a maintainer of an open source project on Github/Gitlab. How would you actually implement this deposit-refund loop in practice?

rglullis 14 hours ago||

I believe you are asking me in jest, but if you are genuine, this is what I would add to my CONTRIBUTING.md

``` # FIrst-time contributors

Due to the increased number of AI bots and low-effort contributions, we are being forced to add some friction for first-time contributors. PRs are closed for anyone not explicitly added to our list of authorized users.

To be accepted in the list, you must do one of the following:

- Show a history of meaningful contributions in projects from related technologies done before Jan 1st, 2023.

- Be vouched by one of the existing contributors in the core team

- (If you have github sponsors/polar/patreon) Be a sponsor for the project for the last 3 months)

- Submit a small payment, which will be held in escrow until your PR is accepted. The following methods are accepted (choose all that apply: paypal, SEPA, Crypto, Venmo, Pix, UPI, M-Pesa, etc) ```

corps_and_code 14 hours ago||

Oh no, I'm being genuine. Documenting the process itself wasn't what I was curious about, I was wondering how you'd go about your last bullet. Accepting lots of currencies can be hard, but I guess I'm not super familiar with online escrow services. I'm not sure how simple they can make that process, or who would pay the cost of using them (I assume they're not free).

I was also wondering how automated or manual you would envision the review process. I'm guessing your hope would be that the small deposit would stem the flow of submissions enough to make it all possible to review manually again, and you would also manually return all the payments sent to escrow?

rglullis 14 hours ago||

Yes, I'm assuming that adding requirement for payment would bring the number of requests down to a level which I could manage with a simple spreadsheet.

Paypal/SEPA transfers are free in Europe. And even if I lived in the US and had to pay a small processor fee, I'd be more than willing to cover the $0.50 in fees if that meant I was receiving contributions from people who went through all the trouble.

Barbing 16 hours ago|||

$10 to a Silicon Valley software engineer reading this comment may feel like... $(a lot?)... to a range of other would-be contributors (thinking of $6/day minimum wages in some places, for example)

Wonder if a dollar would work for now until more people give bots credit cards.

skrebbel 16 hours ago|||

Easily? You think the kind of people who think it makes sense to make bogus slop PRs are going to react reasonably to overburdened volunteer maintainers refusing to give them their US$10 back?

rglullis 16 hours ago||

Yes. Once a PR is rejected, contact from that bot is blocked. No appeals.

skrebbel 14 hours ago|||

This is never going to work. Sufficiently many of these people are going to find maintainers' home addresses and send them death threats and the likes. If you see how badly some people flip out just because their PR is rejected, it's going to be much much worse if their PR is rejected and their money is taken.

rglullis 13 hours ago||

Ok. If I'm the maintainer receiving death threats over that, I'd tell them they would get the $10 dollars back, plus some extra money for their troubles.

Location of the envelope with the money: the same police station where I'd reported the death threats.

rtdq 16 hours ago||||

The worst case is that someone loses out on $10, no? How does this work if the maintainer is the swindler?

smaudet 16 hours ago||

I don't think that is a (very realistic) concern. AI is slop, the problem is not that the real contributors are struggling to get PRs merged.

The bigger issue being, raising the bar to students who may have otherwise had productive careers (but education is a general issue, where the students don't even yet recognize they are being scammed).

rtdq 16 hours ago||

I don't follow, and I'd be concerned that this opens up a cottage industry of bots generating plausible looking repositories that unwitting contributors would attempt to contribute to. We already know that bots are astroturfing repos to generate overinflated star counts. I'd say the least crap option here is to honeypot PR contributions from bots

smaudet 15 hours ago||

This feels like bot logic, lol.

Unless the contributors don't care about the repos they contribute to, this is not a likely scenario. AI doesn't care. We do.

rtdq 15 hours ago||

What is bot logic exactly?

You keep describing this as not a likely or realistic scenario. But why is the likelihood even of relevance here? The way to avoid the worst case i.e scammed of your money, is to not even put it on the table in the first place.

smaudet 14 hours ago||

> What is bot logic exactly?

Ill thought out logic like your own. I think you are likely a bot at this point.

It's not likely, because that's not something that people are likely to do. Only a bot like yourself with a poor model of the world will do this type of thing. It will be amusing to see the AI bots trying to run the scam you are describing and then nobody will contribute to the fake projects... except other fake AI contributors.

rtdq 13 hours ago||

Dude, you're claiming that there's no likelihood of people getting swindled out of their money by handing it over to strangers. So your reaction is to play the bot card? We're done. You're clearly not arguing in good faith here.

rglullis 12 hours ago||

> people getting swindled out of their money by handing it over to strangers.

I think that OP is trying to say is that there is very little reason for a human to go through the trouble of contributing to a "plausible looking fake repo".

To get to the point that a repo starts to attract interest from other contributors, that project needs to have actual utility.

Who in their right mind would jump into opening a PR from projects they never used? And if the project does get used to the point that it attracts people interested in contribute to improve it, wouldn't it mean that we've achieved https://xkcd.com/810 ?

godelski 16 hours ago|||

So I pay $10 when your bot fucks up?

That's called theft. And for what, one banana?

rglullis 16 hours ago||

Obviously, the triage should be done by a human and not automated.

godelski 14 hours ago||

Doesn't that put us into the same position?

Let's also be realistic, everything that can be automated will. Even if that thing is worse off for it. There's a clear historic pattern of this. Companies and people love to be penny wise and pound foolish.

rglullis 14 hours ago||

> Doesn't that put us into the same position?

Of course not, because the number of low-quality PRs with $10 attached to it will be lower than whatever number of PRs are being created now.

godelski 11 hours ago||

You also lose out on a lot of would-be PRs. By people who don't have the money, don't have trust, or have a visceral "fuck you" stance. There's a lot more reasons that this suggestion creates a gate that dissuades the people you want. I stand by that the solution is naïve, but you're welcome to give it a try on your projects. I'm sure it'll be effective at reducing a lot of spammers, but I'm also pretty convinced it'll come with a large false positive rate, which is invisible (giving you false confidence)

rglullis 10 hours ago||

> You also lose out on a lot of would-be PRs.

I am more concerned about the sustainability of the projects as a whole than trying to optimize the number of potential random PRs.

> you're welcome to give it a try on your projects.

It's not exactly the same, but in a way I'm already doing that with Communick. I'm running one of the few Matrix and Fediverse services where members must pay to have access. Up until last week, I was giving 14 days as a free trial period and no deposit/confirmed subcription required. But now because of AI bots, I dropped it and I am collecting payment info before activating any account.

If I were playing the startup playbook, that would be insane. It's already crazy to try to charge something that people are used to get for free; my conversion rates are already low, requiring credit card info will make them even lower.

In the end of the day, I don't care. At first I was really hoping this would be something profitable, right now I just keep it running because I can. Even with the small number of users Ithe servers get, I get enough to cover the better part of operational costs, I get to sharpen my devops skills and have a test lab to learn what is that people really want (i.e, they are willing to pay to have solved) vs what people claim to be a problem.

All in all, the Communick instances are not going to win any popularity contest, but my servers have been up and running without major issues or drama for more than 6 years, and that's a lot more than I can say for all the other servers that have come and gone because the admins tried to play the numbers game.

godelski 9 hours ago||

  > trying to optimize the number of potential random PRs.

You're misrepresenting my comment. I didn't say at need to optimize, just consider. Don't strawman me here

You can't just hand wave them away as if this isn't an important factor. If you don't care about them at all I got a much much simpler solution: don't allow issues or PRs. Problem solved! But that's not a real solution either

rglullis 8 hours ago||

> You can't just hand wave them away as if this isn't an important factor..

There are plenty of ways to indicate in the project that the Pfand is meant as one way to filter out bad actors, but it doesn't mean that it should be the only way to accept external contributions. You can find somewhere else on the thread where I listed some alternatives that can be used as well.

> If you don't care about them at all I got a much much simpler solution: don't allow issues or PRs.

Yes, and what is the problem with this solution? That's what many projects are doing and many more will do. They will close access to non-members and only accept someone new when they have some type of social proof. [0] And that is totally fine.

[0]: https://news.ycombinator.com/item?id=43423063

godelski 7 hours ago||

If I have to trust you to give me back my $10, I'm never contributing to your code. Ever.

If I have to trust GitHub to give me back my $10, frankly, I have more trust for a random person on the internet at this point.

Also, you glossed over my banana joke, but it did hold meaning[0]

  > Yes, and what is the problem with [closing down PRs and Issues] solution?

Are you serious? I mean it is an acceptable solution but it's completely orthogonal to the one we've been discussing. I can see you're not serious. I was skeptical because the first comment, but thanks for making that clear now.

[0] https://www.youtube.com/watch?v=Nl_Qyk9DSUw

rglullis 3 hours ago||

> f I have to trust you to give me back my $10, I'm never contributing to your code. Ever.

That's the whole point! Either you are willing to trust the money to someone you don't know, or you will have to find another way to establish social proof.

> you glossed over my banana joke,

Sorry, not the biggest AD fan. Anyway, I am saying people will have to "pay up or establish social proof". Charging from a family member is not the same.

zarzavat 16 hours ago|||

What? No. A PR is me giving my time to the project. I don't get anything out of it except the warm feeling of having helped out. If I have to pay money to submit a PR then I'm going to play video games instead.

rglullis 12 hours ago||

> A PR is me giving my time to the project

Unfortunately, the issue is that time is not enough of a filter anymore. The time from machines is basically worthless compared to yours, so you need to give something else, and that something else needs to be something that shows you have actual skin in the game.

Lyrkan 11 hours ago||

> so you need to give something else

Well no, they don't need to. As they said they could just do something else instead of contributing (and I know I would too).

Your proposal would just end up killing those open source projects even more than what you are trying to solve.

rglullis 10 hours ago||

> Your proposal would just end up killing those open source projects

Name 3 open source projects that are dependent on a continuous stream of first-time contributors to keep going.

tourist2d 16 hours ago||

[dead]

xivzgrev 12 hours ago||

I like how they are taking a stand against vanity metrics. Rare to see that these days

cemoktra 14 hours ago||

AI company annoyed by AI ... Surprise

optionalsquid 17 hours ago||

I don't have a better solution, unfortunately, but it doesn't seem seem to like the spam problem has been solved. It has just been moved from pull requests to commits:

Currently, more than 10% of all commits in the archestra repo are essentially noise (369 of 3521 commits), accounting for more than half of all commits in the last month (303 of 578 commits).

But maybe (probably) the amount of such commits will go down over time, compared to the growing amounts of AI slop

ildari 17 hours ago|

As those commits were made from our system they don't create any noise for us, as PR/issues/email notification do. We only include real people who could solve the captcha and their input is mostly valuable

kazinator 14 hours ago||

> Final Words

> While GitHub reports massive metric growth — a substantial part of which is AI-generated — we as an open source project team have to do the heavy lifting of cleaning up AI slop from our repository and come up with esoteric workarounds to keep the level of legitimacy of our open source audience.

AI generated slop!

exabrial 16 hours ago||

Signed Commits from known authors would also help!

zzzeek 17 hours ago|

so...they are manually re-setting the "interaction limits" over and over again, since they are only temporary?

why not use hooks to automatically reject issue comments / PRs etc. from users that didnt go through onboarding, rather than repurposing GH features that aren't really designed for that use (and are hence in danger of being changed someday)?

ildari 16 hours ago|

GH sends the email notification to all subscribers at the moment of posting a comment. There is no cooldown or a way to unsend the notification using hooks

More comments...