Posted by dw64 4 days ago
Sorry folks but we lost.
There are far more ways to produce expensive noise with LLMs than signal. Most non-psychopathic humans tend to want to produce veridical statements. (Except salespeople, who have basically undergone forced sociopathy training.) At the point where a human has learned to produce coherent language, he's also learned lots of important things about the world. At the point where a human has learned academic jargon and mathematical nomenclature, she has likely also learned a substantial amount of math. Few people want to learn the syntax of a language with little underlying understanding. Alas, this is not the case with statistical models of papers!
How will journals or conferences handle AI slop?
so the LLM detection problem is (theoretically) impossible for SOTA LLMs; in practice, it could be easier due to the RLHF stage inserting idiosyncrasies.
Anecdotal: A few weeks ago, I came across a story on HN where many commenters immediately recognized that an LLM had written the article, and the author had actually released his prompts and iterations. So it was not a one-shot prompt but more like 10 iterations, and still, many people saw that an LLM wrote it.
And anyway, those accuracies tend to be measured on 100% human-generated vs. 100% machine-generated texts by a single LLM... good luck with texts that contain a mix of human and LLM contents, mix of contents by several LLMs, or an LLM asked to "mask" the output of another.
I think detection is a lost cause.
They should solve the real problem of obtaining more funding and volunteers so that they can take on the increased volume of submissions. Especially now that AI's here and we can all be 3 times as productive for the same effort.
Huh, I guess it's only a subset of papers, not all of them. My brain doesn't work that way, because I don't like assigning custom rules for special cases (edit: because I usually view that as a form of discrimination). So sometimes I have a blind spot around the realities of a problem that someone is facing, that don't have much to do with its idealization.
What I mean is, I don't know that it's up to arXiv to determine what a "review article and position paper" is. Because of that, they must let all papers through, or have all papers face the same review standards.
When I see someone getting their fingers into something, like muddying/dithering concepts, shifting focus to something other than the crux of an argument (or using bad faith arguments, etc), I view it as corruption. It's a means for minority forces to insert their will over the majority. In this case, by potentially blocking meaningful work from reaching the public eye on a technicality.
So I admit that I was wrong to jump to conclusions. But I don't know that I was wrong in principle or spirit.
Those are terms of art, not arbitrary categories. They didn't make them up.
This does not seem like a win even if your “fight AI with AI plan works.”