Updated practice for review articles and position papers in ArXiv CS category

Posted by dw64 4 days ago

Updated practice for review articles and position papers in ArXiv CS category(blog.arxiv.org)

496 points | 237 commentspage 2

generationP 4 days ago|

I have a hunch that most of the slop is not just on CS but specifically about AI. For some reason, a lot of people's first idea when they encounter an LLM is "let's have this LLM write an opinion piece about LLMs", as if they want to test its self-awareness or hack it by self-recursion. And then they get a medley of the learning data, which if they are lucky contains some technical explanations sprinkled in.

That said, AI-generated papers have already been spotted in other disciplines besides cs, and some of them are really obvious (arXiv:2508.11634v1 starts with a review of a non-existing paper). I really hope arXiv won't react by narrowing its scope to "novel research only"; in fact there is already AI slop in that category and it is harder to spot for a moderator.

("Peer-reviewed papers only" is mostly equivalent to "go away". Authors post on the arXiv in order to get early feedback, not just to have their paper openly accessible. And most journals at least formally discourage authors from posting their papers on the arXiv.)

kittikitti 2 days ago||

In my experience, arXiv is not a preprint platform. It's a strange gatekeeper of science and should be avoided altogether. They have their favorites which they deem as "high quality" and everything else gets rejected. I am eagerly awaiting for people to dismiss arXiv altogether.

naveen99 4 days ago||

Isn’t github the normal way of publishing now for cs ?

cubefox 4 days ago||

The PDFs (yes, they still use PDF) keep being uploaded to arXiv.

naveen99 4 days ago||

ArXiv is just extra steps for a worse experience. Github is perfectly fine for pdf’s also.

cubefox 2 days ago||

But arXiv carries a certain ... reputation. I assume that's why papers keep being uploaded there.

macleginn 4 days ago||

Does Google Scholar index it?

ninetyninenine 4 days ago||

Didn’t realize LLMs were restricted to only CS topics.

Don’t understand why it restricted one category when the problem spans multiple categories.

habinero 3 days ago||

If you read through the papers, you'll realize the actual problem is blatant abuse and reputation hacking.

So many "research papers" by "AI companies" that are blog posts or marketing dressed up as research. They contribute nothing and exist so the dudes running the company can point to all their "published research".

Quizzical4230 3 days ago||

Shameless plug.

PaperMatch [1] helps solve this problem (large influx of papers) by running a semantic search on top of abstracts, for all of arXiv.

[1]: https://papermatch.me/

an0malous 4 days ago||

Why not just reject papers authored by LLMs and ban accounts that are caught? arXiv’s management has become really questionable lately, it’s like they’re trying to become a prestigious journal and are becoming the problem they were trying to solve in the first place

tarruda 4 days ago||

> Why not just reject papers authored by LLMs and ban accounts that are caught?

Are you saying that there's an automated method for reliably verifying that something was created by an LLM?

an0malous 4 days ago||

If there wasn’t, then how do they know LLMs are the problem?

orbital-decay 4 days ago|||

What matters is the quality. Requiring reviews and opinions to be peer-reviewed seems a lot less superficial than rejecting LLM-assisted papers (which can be valid). This seems like a reasonable filter for papers with no first-party contributions. I'm sure they ran actual numbers as well.

catlifeonmars 4 days ago||

It’s articles (not papers) _about_ LLMs that are the problem, not papers written _by_ LLMs (although I imagine they are not mutually exclusive). Title is ambiguous.

dabber 4 days ago||

> It’s articles (not papers) _about_ LLMs that are the problem, not papers written _by_ LLMs

No, not really. From the blog post:

> In the past few years, arXiv has been flooded with papers. Generative AI / large language models have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write. While categories across arXiv have all seen a major increase in submissions, it’s particularly pronounced in arXiv’s CS category. > [...] > Fast forward to present day – submissions to arXiv in general have risen dramatically, and we now receive hundreds of review articles every month. The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues.

GMoromisato 4 days ago||

I suspect that LLMs are better at classifying novel vs junk papers than they are at creating novel papers themselves.

If so, I think the solution is obvious.

(But I remind myself that all complex problems have a simple solution that is wrong.)

thatguysaguy 4 days ago||

Verification via LLM tends to break under quite small optimization pressure. For example I did RL to improve <insert aspect> against one of the sota models from one generation ago, and the (quite weak) learner model found out that it could emit a few nonsense words to get the max score.

That's without even being able to backprop through the annotator, and also with me actively trying to avoid reward hacking. If arxiv used an open model for review, it would be trivial for people to insert a few grammatical mistakes which cause them to receive max points.

HL33tibCe7 4 days ago||

> I suspect that LLMs are better at classifying novel vs junk papers than they are at creating novel papers themselves.

Doubt

LLMs are experts in generating junk. And generally terrible at anything novel. Classifying novel vs junk is a much harder problem.

beloch 4 days ago||

A better policy might be for arXiv to do the following:

1. Require LLM produced papers to be attributed to the relevant LLM and not the person who wrote the prompt.

2. Treat submissions that misrepresent authorship as plagiarism. Remove the article, but leave an entry for it so that there is a clear indication that the author engaged in an act of plagiarism.

Review papers are valuable. Writing one is a great way to gain, or deepen, mastery over a field. It forces you to branch out and fully assimilate papers that you may have only skimmed, and then place them in their proper context. Reading quality review papers is also valuable. They're a great way for people new to a field to get up to speed and they can bring things that were missed to the fore, even for veterans of the field.

While the current generation of AI does a poor job of judging significance and highlighting what is actually important, they could improve in the future. However, there's no need for arXiv to accept hundreds of review papers written by the same model on the same field, and readers certainly don't want to sift through them all.

Clearly marking AI submissions and removing credit from the prompters would adequately future-proof things for when, and if, AI can produce high quality review papers. Clearly marking authors who engage in plagiarism as plagiarists will, hopefully, remove most of the motivation to spam arXiv with AI slop that is misrepresented as the work of humans.

My only concern would be for the cost to arXiv of dealing with the inevitable lawsuits. The policy arXiv has chosen is worse for science, but is less likely to get them sued by butt-hurt plagiarists or the very occasional false positive.

habinero 3 days ago|

That doesn't solve the problem they're trying to solve, which is their all-volunteer staff is being flooded with LLM slop and doesn't have the time to artistically moderate.

If you want to blame someone, blame all the people LARPing as AI researchers.

beloch 3 days ago||

The majority of these submissions are not from anonymous trolls. They're from identifiable individuals who are trying to game metrics. The threat of boosting their number of plagiarism offences on public record would deter such individuals quite effectively.

Meanwhile, banning review articles written by humans would be harmful in many fields. I'm not in CPSC, but I'd hate to see this policy become the norm for all disciplines.

bob1029 4 days ago|

> The advent of large language models have made this type of content relatively easy to churn out on demand, and the majority of the review articles we receive are little more than annotated bibliographies, with no substantial discussion of open research issues.

I have to agree with their justification. Since "Attention Is All You Need" (2017) I have seen maybe four papers with similar impact in the AI/ML space. The signal to noise ratio is really awful. If I had to pick a semi-related paper published since 2020 that I actually found interesting, it would have to be this one: https://arxiv.org/abs/2406.19108 I cannot think of a close second right now.

All of the machine learning papers are pure slop to me now. The last one I looked at had an abstract that was so long it put me to sleep. Many of these papers aren't attempting basic decorum anymore. Mandatory peer review would fix a lot of this. I don't think it is acceptable for the staff at arXiv to have to endure a Sisyphean mountain of LLM shit. They definitely need to push back.

an0malous 4 days ago||

Isn’t the signal to noise problem what journals are supposed to be for? I thought arxiv was supposed to just be a record keeper, to make it easy to share papers and preprints.

Al-Khwarizmi 3 days ago|||

You picked the arguably most impactful AI/ML paper of the century so far, no wonder you don't find others with similar impact.

Not every paper can be a world-changing breakthrough. Which doesn't mean that more modest papers are noise (although some definitely are). What Kuhn calls "normal science" is also needed for science to work.

programjames 4 days ago||

This is only for review/position papers, though I agree that pretty much all ML papers for the past 20 years have been slop. I also consider the big names like, "Adam", "Attention", or "Diffusion" slop, because even thought they are powerful and useful, the presentation is so horrible (for the first two) or they contain major mistakes in the justication of why they work (the last two) that they should never have gotten past review without major rewrites.

More comments...