Posted by ildari 16 hours ago
A possible fix would be to add a paths-ignore in on-commits-to-main.yml:
paths-ignore:
- 'EXTERNAL_CONTRIBUTORS.md'
I am no expert in this, it's just something I noticed.> When requiring approvals only for first-time contributors (the first two settings), a user that has had any commit or pull request merged into the repository will not require approval. A malicious user could meet this requirement by getting a simple typo or other innocuous change accepted by a maintainer, either as part of a pull request they have authored or as part of another user's pull request.
If companies can screw you over and claim it's a mistake, there isn't much a person can do.
It's all about level's of trust, a maintainer going rogue is less likely, a past contributor going rogue more likely but not too much, a stranger with a typo pr merged even more likely but still, a complete stranger least trust worthy.
If you are insecure because someone has had one of their otherwise completely innocent PRs merged into your repo... you are insecure, period.
Also please let us delete PRs just like we can delete issues.
Your suggestion would help a bit but I would prefer the opposite: before someone can 'pollute' my pull request space and draw attention from subscribers I would prefer an acceptance step (just like a moderator on a forum) instead of having to archive the PRs.
This is especially important as (AI) spam increases and just because I am away for a few days or weeks I don't want those PRs lurking around.
My usual experience is this:
1. We open an issue that needs to be fixed 2. slop bots create multiple slop PRs 3. slop bots spam comments on the issues, pointing to their slop PRs
The only general methods for preventing this are are restricting PR's (not comments, I believe) to contributors - which is a hassle to maintain, and restricting to older accounts - which doesn't work because the bot accounts are not newly created.
Then we need to perform _way too many_ just to get rid of the slop: - navigate multiple pages and confirmations to ban the account from our org - open each PR manually - close it manually
This takes at least 15 clicks and is made _so much worse_ by how slooooooooow the UI is. Every click takes 2 seconds!!! How can "ban this account and delete everything it ever did" be more than a max of 2 clicks?
What we really need is a "locked down mode" where every interaction (PR, issue, comment) with the repo that isn't from maintainers or specifically whitelisted people goes into a moderation queue. Maintainers can confirm or deny the action using a single click (which does not take 2 fucking seconds to load).
This sentence also illustrates the absurdity of this investment model. It imposes a trade-off between building good software, and complying with the investor's metrics. They probably call such metrics evidence-based, but this example shows that they arbitrarily capture some numbers to obscure the lack of meaningful measurements.
https://github.com/LibreTranslate/LibreTranslate/blob/main/A...
A similar system would be nice for issues, though I'm not sure what it'd look like if issues are the springboard for contributing PRs.
Not likely to ever happen (as others said), GitHub/MS want to sell CoPilot subscriptions/tokens and LLM-generated PRs are a part of that business model.
/s
The issue here is the core model is broken (misaligned incentives). That's not something you are going to fix with a github "downstream". A token system could help but it's easy to imagine ways that could be gamed, if not implemented well.
If search ads are blocked on search engines, then there is no revenue for the browser. It's that simple (on top of that Brave has other revenues, but the majority is search ads).
So it's a game of hoping that the majority won't change the default.
This is the main reason Brave does not block search ads specifically by default, but still block the other ads. Blocking the other ads there are no consequences, since anyway this revenue is not shared back to the browser.
This is why the business model of Brave is cynical.
-> It's the same model as AdBlock and the "Acceptable Ads" (block all ads, except the acceptable ads, unless you disallow them)
> Maybe GitHub should temporarily block accounts from raising PRs if like 95%+ of them are getting rejected.
It's so bad I'd be okay with a lower bar where it's flagged if they're posting the same message over multiple repos... FFS they aren't even stopping this shit https://news.ycombinator.com/item?id=47964617the rate of comits/PRs total
The rate of PRs to repos they don’t own
The reject rate of PRs
The number of ban
An estimated “AI” or bot score or status flag
There are a few better attempts at GitHub metrics calculators but I have not seen any that move beyond the paradigm of more vomits is default assumed good. It’s time to foreground quality not just quantity. The GitHub “4 kpis” are entirely action oriented.
We made "Github contributions" a metric for people applying for dev jobs. So, of course, because devs are the kind of people we are, they started working out how to game that metric.
Some folks decided to start paying bounties on bug fixes, features, etc. Those bounties are fairly trivial by western standards, but are significant for developing countries. This creates a new career for developers; racing to collect the bounties on offer.
LLMs have exacerbated these problems by allowing existing people doing this to do it faster, and also allowing more people to pretend to be software developers and get in on the action.
If we stopped allowing LLM-authored contributions we'd still have too many shitty PRs. It would just be back to pre-LLM levels of "too many".
The answer is to make Github contributions valueless. Stop paying bounties, and stop using them to assess candidates.
And it is not like AI spam would be limited or even primary targetted at bounties.
This [0] is an example, there are many more.
The whole idea that we have to have a "portfolio" of work.
[0] https://talentslab.io/7-strategies-for-a-junior-developer-to...
Cowboy coders got a virtual cowgirl coder and sold it to everyone, hmm, maybe... (respected or not, solo devs don't always have the requisite skills to not be a cowboy, either due to lack of experience or lack of innate skill)
I don't know that I completely buy this narrative, though. There has been a strong, top-down push for this since the "beginning".
Negative score would be reports from other users because of spammy content or not acknowledged issues, with a middle ground of neutral score (+-0) or little positive score to issues or whatever with clear good intention, but couldn't reach a proper merged PR or were not issues (e.g. issue existed but wasn't the correct repo to be addressed, PR was good but needed other stuff to be implemented prior to it, maybe in the long run, etc)
Given any manipulatable scheme, AI will figure out how to manipulate it. For the OP, what happens if a single AI manages to get through to contributor? Then it starts elevating other AIs to contributor, and we're off again. There doesn't have to be a purpose to this. Trolls will troll, and trolls armed with AI bots can devote endless energy to doing so. The more you work to keep them out, the more fun it becomes for them.
I wish I had an answer for that problem. But I don't.
You could probably use some kind of pairwise ranking algorithm (like anything based on the Bradley-Terry model) to rate human vs. AI contributions, but that would take a lot of manual effort. Google is using it to (supposedly) improve their searching algorithms. They give testers two different versions and ask them what's better.
The totality of someone's currency is their reputation.
Of course, now the decision becomes...who is the central currency issuer that creates it?
Then they'll get removed by the humans? Its about cutting down work, not about eliminating the work entirely
The current approach removes about 99% of their overhead it would seem. If they have to do a few manual interventions here and there, that seems like a huge win overall
Frontier users: 527,865 Light indexed: 527,865 Ready to queue: 9,083 Fast scores ready: 0 Activity events 24h: 30,266 Fast scores completed 24h: 19,123 Deep jobs completed 24h: 3,043 Fast-score ETA: n/a Deep-hydrate ETA: 69h Stale running jobs: 0 GitHub backpressure jobs: 19,113 High automation signals: 4,608 Medium automation signals: 1,327 Completed jobs: 74,714
Biggest challenge is Github's rate limits. At this pace it will take two more months to have 98% coverage. But after that the maintenance should be quite straight forward.
The Elo rating system doesn't make sense in this context; it's designed around collecting zero sum game results for a given community of players and building a model around it.
There are a lot of political tricks that get used.
What is scary is that one of those kinds of users are malicious state actors. Like North Korea and Russia...
The writing style in their onboarding doc has common AI tells (in the quote: em dashes, “it’s not A, it’s B” sentence).
I can understand that, perhaps they want to fight fire with fire or don’t have time as they already say. Still, it all feels like inadequate half measures to me.
At least bringing up the underlying method (restrict to contributors) has spawned the discussion about how that's probably a bad idea on the security side.
You can’t submit a PR because your laptop is too slow? Rent some hash rate from someone, and now you’ve just made a system of paying botnet owners to be able to make a typo fix on a github repo. HashCash was never used in the real world for a reason, it sounds cute but the incentives are so insane as to only work in a vacuum where you assume everyone isn’t cheating.
Sure, but looking at the cost to do it at scale is the wrong metric. I surely can't compete with a career spammer on emails-per-second or even emails-per-dollar, but I also don't need to.
It's more about the expected-value versus the cost. For example, my expected benefit from one email to my family is (while hard to quantify) hopefully much higher than a spammer's expected benefit of one spam email going out, which has a very small chance of leading to any amount of money. Attaching a CPU-churn cost per email is something I can ignore on my desktop, but they have to at least budget for it.
I'd also like to note that the win-condition isn't as extreme as making spam (or other "crimes") truly unprofitable, it just needs to be less profitable than other things the time/resources could be used for.
We really need to solve SPAM itself here, I think there may be a way to do it. I.e., the problem of spam is NtoN scaling connections. The network has never been able to solve that problem (exponential is the hardest). Limiting communication in terms of mesh networking may be the ultimate solution - bots can't get to you because they can't reach you.
What needs to be invented is a bridging protocol - some way to establish "legitimate" lines of communication over a network, while preserving (to some degree) privacy and decentralization. AI can only enter this network by being explicitly added to the channel, and thereby explicitly and easily blocked (and also solving the general SPAM issue once and for all).