Posted by keybits 21 hours ago
It’s an extension of the asymmetric bullshit principle IMO, and I think now all workplaces / projects need norms about this.
Where is the problem? If I don't have the time to review a PR, I simply reject it. Or if I am flooded in PRs, I only take those from people from which I know that their PRs are of high quality. In other words: your assumption "expecting people to review and act upon it" is wrong.
Even though I would bet that for the kind of code that I voluntarily write in my free time, using an LLM to generate lots of code is much less helpful because I use such private projects to try out novel things that are typically not "digested stuff from the internet".
So, the central problem that I rather see is the license uncertainties for AI-generated code.
I wonder how it would look if open source projects required $5 to submit a PR or ticket and then paid out a bounty to the successful or at least reasonable PRs. Essentially a "paid proof of legitimacy".
Unfortunately, there is no community equivalent of PoS—the only alternative is introducing different barriers, like ID verification, payment, in-person interviews, private invite system, etc., which often conflict with the nature of anonymous volunteer communities.
Such communities are perhaps one of the greatest things the Web has given us, and it is sad to see them struggle.
(I can imagine LLM operators jumping on the opportunity to sell some of these new barriers, to profit from selling both the problematic product and a product to work around those problems.)
That is their business model. Use AI to create posts in LinkedIn, mails in a corporate environment, etc. And then use AI to summarize all that text.
AI creates a problem and then offers a solution.
My current approach is to look at new sources lie The Guardian, Le Monde, AP news, etc. I know that they put the work, sadly places like Reddit and such are just becoming forums that discuss garbage news with bot comments. (I could use AI to identify non-bot comments and news sources, but it does not really work even if it says that it does, and I should not have to do that in the first place either).
But why should this expectation be honored? If someone spends close to zero effort generating a piece of code and lobs it over the fence to me, why would I even look at it? Particularly if it doesn't even meet the requirements for a pull request (which is what it seems like the article is talking about)?
I don't think the definition of collaboration includes making close to zero effort and expecting someone else to expend considerable effort in return.
But if you stop looking at PRs entirely, you eliminate the ability for new contributors to join a project or make changes that improve the project. This is where the conflict comes from.
After a minute (or whatever length of time makes sense for the project), decide whether you're not fully confident that the PR is worth your time to continue reviewing, with the default answer being "no" if you're on the fence. Unless it's a yes, you got a bad vibe; close it and move on. Getting a PR merged will require more effort in making the case that there's value in keeping it open, which restores some of the balance that's been lost in the effort having been pushed to the review side.
If I can say I trust you, the websites you trust will be prioritised for me and marked as reliable (no AI slop, actual humans writing content).
Like a recognition that there's value there, but we're passing the frothing-at-the-mouth stage of replacing all software engineers?
I still don't see how it's useful for generating features and codebases, but as a rubber ducky it ain't half bad.
What has helped has been to turn off ALL automatic AI, e.g. auto complete, and bind it to a shortcut key to show up on request... And forget it exists.
Until I feel I need it, and then it's throw shit at the wall type moment but we've all been there.
It does save a lot of time as a google on steroid, and wtf-solver. But it's a tool best kept in its box, with a safety lock.
That's one way of looking at it.
Another way to look at it is GPT3.5 was $600,000,000,000 ago.
Today's AIs are better, but are they $600B better? Does it feel like that investment was sound? And if not, how much slower will future investments be?
This just smells like classic VC churn and burn. You are given it and have to spend it. And most of that money wasn't actually money, it was free infrastructure. Who knows the actual "cost" of the investments, but my uneducated brain (while trying to make a point) would say it is 20% of the stated value of the investments. And maybe GPT-5 + the other features OpenAI has enabled are $100B better.
But everyone who chipped in $$$ is counting against these top line figures, as stock prices are based on $$$ specifically.
> but my uneducated brain (while trying to make a point) would say it is 20% of the stated value of the investments
An 80% drop in valuations as people snap back to reality would be devastating to the market. But that's the implication of your line here.
I'm sure there's still some improvements that can be made to the current LLMs, but most of those improvements are not in making the models actually better at getting the things they generate right.
If we want more significant improvements in what generative AI can do, we're going to need new breakthroughs in theory or technique, and that's not going to come by simply iterating on the transformers paper or throwing more compute at it. Breakthroughs, almost by definition, aren't predictable, either in when or whether they will come.
A different way to say it. Imagine if programming a computer was more like training a child or a teenager to perform a task that requires a lot of human interaction; and that interaction requires presenting data / making drawings.
GPT-5 and GPT-5-codex are significantly cheaper than the o-series full models from OpenAI, but outperform them.
I won't get into whether the improvements we're seeing are marginal or not, but whether or not that's the case, these examples clearly show you can get improved performance with decreasing resource cost as techniques advance.
You mean what they have conceded so far to be what they mean. Every new model they start to see that they have to give up a little more.
It feels like people and projects are moving from a pure “get that slop out of here” attitude toward more nuance, more confidence articulating how to integrate the valuable stuff while excluding the lazy stuff.
I get value from it everyday like a lawyer gets value from LexisNexis. I look forward to the vibe coded slop era like a real lawyer looks forward to a defendant with no actual legal training that obviously did it using LexisNexis.
The funny thing is you're clearly within the hyperbolic pattern that I've described. It could plateau, but denying that you're there is incorrect.
I'm genuinely curious as to what's going through your mind and if people readily give you this.
I suspect you're asking dishonestly but I can't simply assume that.
You should delete this comment.
I use it like this: If a PR is LLM-generated, you as a maintainer either merge it if it's good or close if it's not. If it's human-written, you may spend some time reviewing the code and iterating on the PR as you used to.
Saves your time without discarding LLM PRs completely.
I really like the way Discourse uses "levels" to slowly open up features as new people interact with the community, and I wonder if GitHub could build in a way of allowing people to only be able to open PRs after a certain amount of interaction, too (for example, you can only raise a large PR if you have spent enough time raising small PRs).
This could of course be abused and/or lead to unintended restrictions (e.g. a small change in lots of places), but that's also true of Discourse and it seems to work pretty well regardless.
From https://news.ycombinator.com/newsguidelines.html: "Please use the original title, unless it is misleading or linkbait" (note that word unless)
I don't like it but I can hardly blame them.
Usually engagement-bait titles are cover for uninteresting articles, but yeah in this case it's way more interesting than the title to me anyway.
i guess it makes it even more obvious when people are discussing the title instead of the actual piece, which is routine on HN but not always obvious! Although to be fair, the title describes one part of the piece, sure. the part with the least original insight.
Now that being said a person should feel free to do what they want with their code. It’s somewhat tough to justify the work of setting up infrastructure to do that on small projects, but AI PRs aren’t likely a big issue fit small projects.
...that's just scratching the surface.
The problem is that LLMs make mistakes that no single human would make, and coding conventions should anyway never be the focus of a code review and should usually be enforced by tooling.
E.g. when reading/reviewing other people's code you tune into their brain and thought process - after reading a few lines of (non-trivial) code you know subconsciously what 'programming character' this person is and what type of problems to expect and look for.
With LLM generated code it's like trying to tune into a thousand brains at the same time, since the code is a mishmash of what a thousand people have written and published on the internet. Reading a person's thought process via reading their code doesn't work anymore, because there is no coherent thought process.
Personally I'm very hesitant to merge PRs into my open source projects that are more than small changes of a couple dozen lines at most, unless I know and trust the contributor to not fuck things up. E.g. for the PRs I'm accepting I don't really care if they are vibe-coded or not, because the complexity for accepted PRs is so low that the difference shouldn't matter much.
Alas…
Some people will absolutely just run something, let the AI work like a wizard and push it in hopes of getting an "open source contribution".
They need to understand due diligence and reduce the overhead of maintainers so that maintainers don't review things before it's really needed.
It's a hard balance to strike, because you do want to make it easy on new contributors, but this is a great conversation to have.
A couple of weeks ago I needed to stuff some binary data into a string, in a way where it wouldn't be corrupted by whitespace changes.
I wrote some Rust code to generate the string. After I typed "}" to end the method: 1: Copilot suggested a 100% correct method to parse the string back to binary data, and then 2: Suggested a 100% correct unit test.
I read both methods, and they were identical to what I would write. It was as if Copilot could read my brain.
BUT: If I relied on Copilot to come up with the serialization form, or even know that it needed to pick something that wouldn't be corrupted by whitespace, it might have picked something completely wrong, that didn't meet what the project needed.
The biggest place I've seen AI created code with tests produce a false positive is when a specific feature is being tested, but the test case overwrites a global data structure. Fixing the test reveals the implementation to be flawed.
Now imagine you get rewarded for shipping new features a test code, but are derided for refactoring old code. The person who goes to fix the AI slop is frowned upon while the AI slop driver gets recognition for being a great coder. This dynamic caused by AI coding tools is creating perverse workplace incentives.