Updated practice for review articles and position papers in ArXiv CS category

Posted by dw64 4 days ago

Updated practice for review articles and position papers in ArXiv CS category(blog.arxiv.org)

497 points | 237 commentspage 4

whatever1 4 days ago|

The number of content generators is now infinite but the number of content reviewers is the same.

Sorry folks but we lost.

jsrozner 4 days ago||

I had a convo with a senior CS prof at Stanford two years ago. He was excited about LLM use in paper writing to, e.g., "lower barriers" to idk, "historically marginalized groups" and to "help non-native English speakers produce coherent text". Etc, etc - all the normal tech folk gobbledygook, which tends to forecast great advantage with minimal cost...and then turn out to be wildly wrong.

There are far more ways to produce expensive noise with LLMs than signal. Most non-psychopathic humans tend to want to produce veridical statements. (Except salespeople, who have basically undergone forced sociopathy training.) At the point where a human has learned to produce coherent language, he's also learned lots of important things about the world. At the point where a human has learned academic jargon and mathematical nomenclature, she has likely also learned a substantial amount of math. Few people want to learn the syntax of a language with little underlying understanding. Alas, this is not the case with statistical models of papers!

pwlm 4 days ago||

"review articles and position papers must now be accepted at a journal or a conference and complete successful peer review."

How will journals or conferences handle AI slop?

hamonrye 3 days ago||

[dead]

anupj 4 days ago||

[dead]

whatpeoplewant 3 days ago||

[flagged]

arendtio 4 days ago||

I wonder why they can't facilitate LLMs in the review process (like fighting fire with fire). Are even the best models not capable enough, or are the costs too high?

DroneBetter 4 days ago||

the problem is generally the same as with generative adversarial networks; the capability to meaningfully detect some set of hallmarks of LLMs automatically is equivalent to the capability to avoid producing those, and LLMs are trained to predict (ie. be indistinguishable from) their source corpus of human-written text.

so the LLM detection problem is (theoretically) impossible for SOTA LLMs; in practice, it could be easier due to the RLHF stage inserting idiosyncrasies.

arendtio 4 days ago||

Sure, having a 100% reliable system is impossible as you have laid out. However, if I understand the announcement correctly, this is about volume, and I wonder if you could have a tool flag articles that show obvious signs of LLM usage.

warkdarrior 3 days ago||

The point is that this leads to an arms race. If Arxiv uses a top-of-line LLM for, say, 20 minutes per paper, cheating authors will use a top-of-line LLM for 21 minutes to beat that.

efavdb 4 days ago||

Curious for the state on things here. Can we reliably tell if a text was LLM generated? I just heard of a prof screening assignments for this, but not sure how that would work.

arendtio 4 days ago|||

Well, I think it depends on how much effort the 'writer' is going to invest. If the writer simply tells the LLM to write something, you can be fairly certain it can be identified. However, I am not sure if the 'writer' provides extensive style instructions (e.g., earlier works by the same author).

Anecdotal: A few weeks ago, I came across a story on HN where many commenters immediately recognized that an LLM had written the article, and the author had actually released his prompts and iterations. So it was not a one-shot prompt but more like 10 iterations, and still, many people saw that an LLM wrote it.

jvanderbot 4 days ago|||

Of course there are people who will sell you a tool to do this. I sincerely doubt it's any good. But then again they can apparently fingerprint human authors fairly well using statistics from their writing, so what do I know.

Al-Khwarizmi 4 days ago||

There are tools that claim accuracies in the 95%-99% range. This is useless for many actual applications, though. For example, in teaching, you really need to not have false positives at all. The alternative is failing some students because a machine unfairly marked their work as machine-generated.

And anyway, those accuracies tend to be measured on 100% human-generated vs. 100% machine-generated texts by a single LLM... good luck with texts that contain a mix of human and LLM contents, mix of contents by several LLMs, or an LLM asked to "mask" the output of another.

I think detection is a lost cause.

zackmorris 4 days ago|

I always figured if I wrote a paper, the peer review would be public scrutiny. As in, it would have revolutionary (as opposed to evolutionary) innovations that disrupt the status quo. I don't see how blocking that kind of paper from arXiv helps hacker culture in any way, so I oppose their decision.

They should solve the real problem of obtaining more funding and volunteers so that they can take on the increased volume of submissions. Especially now that AI's here and we can all be 3 times as productive for the same effort.

tasuki 4 days ago||

That paper wouldn't be blocked. Have you read the thing?

zackmorris 4 days ago||

Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.

Huh, I guess it's only a subset of papers, not all of them. My brain doesn't work that way, because I don't like assigning custom rules for special cases (edit: because I usually view that as a form of discrimination). So sometimes I have a blind spot around the realities of a problem that someone is facing, that don't have much to do with its idealization.

What I mean is, I don't know that it's up to arXiv to determine what a "review article and position paper" is. Because of that, they must let all papers through, or have all papers face the same review standards.

When I see someone getting their fingers into something, like muddying/dithering concepts, shifting focus to something other than the crux of an argument (or using bad faith arguments, etc), I view it as corruption. It's a means for minority forces to insert their will over the majority. In this case, by potentially blocking meaningful work from reaching the public eye on a technicality.

So I admit that I was wrong to jump to conclusions. But I don't know that I was wrong in principle or spirit.

habinero 4 days ago||

> What I mean is, I don't know that it's up to arXiv to determine what a "review article and position paper" is.

Those are terms of art, not arbitrary categories. They didn't make them up.

raddan 4 days ago||

It’s weird to say that you can be three times more efficient at taking down AI slop now that AI is here, given that the problem is exacerbated by AI in the first place. At least without AI authors were forced to actually write the slop themselves…

This does not seem like a win even if your “fight AI with AI plan works.”