Posted by thehappyfellow 10 hours ago
I don't know if I agree with either assertion… I've seen plenty of human-generated knowledge work that was factually correct, well-formatted, and extremely low quality on a conceptual level.
And AI signatures are now easy for people to recognize. In fact, these turns of phrase aren't just recognizable—they're unmistakable. <-- See what I did there?
Having worked with corporate clients for 10 years, I don't view the pre-LLM era as a golden age of high-quality knowledge work. There was a lot of junk that I would also classify as a "working simulacrum of knowledge work."
Most importantly, those sources of errors tend to be consistent. I can trust a certain intern to be careful but ignorant, or my senior colleague with a newborn daughter to be a well of knowledge who sometimes misses obvious things due to lack of sleep.
With AI it's anyone's guess. They implement a paper in code flawlessly and make freshman level mistakes in the same run. so you have to engage in the non intuitive task of reviewing assuming total incompetence, for a machine that shows extreme competence. Sometimes.
AI signatures don't mean low quality, they just mean AI. And humans do use them (I have always used the common AI signatures). And yes, humans produce good-looking garbage, but much more commonly they produce bad-looking garbage. This is all tangential to the point.
It is valuable to have this, because it the work passes the first check then it easier to identify the actual problems. Same reason we have code quality, lint style fixed before reasoning with the actual logic being written.
This is especially true if we start to see more of a split in usage between LLMs based on cost. High quality frontier models might produce better work at a higher cost, but there is also economic cost pressure from the bottom. And just like with human consultants or employees, you’ll pay more for higher quality work.
I’m not quite sure what I’m trying to argue here. But the idea that an LLM won’t produce a low quality report just seemed silly to me.
Working in a team isn’t adversarial, if i’m reviewing my colleague’s PR they are not trying to skirt around a feature, or cheat on tests.
I can tell when a human PR needs more in depth reviewing because small things may be out of place, a mutex that may not be needed, etc. I can ask them about it and their response will tell me whether they know what they are on about, or whether they need help in this area.
I’ve had LLM PRs be defended by their creator until proven to be a pile of bullshit, unfortunately only deep analysis gets you there
Putting a high level of polish on bad ideas is basically the grifter playbook. Throughout the business world you will find workers and entire businesses who get their success by dressing up poor ideas and bad products with all of the polish and trimmings associated with high quality work.
You wouldn't use a calculator that is as good as a human and makes mistakes as often.
But if you are trying to understand something well, there is no better tool for helping you than AI.
It is not so much that the "tells" of a poor quality work are vanishing, but that even careful scrutiny of a work done with AI is going to become too costly to be done only by humans. One only has so much time to read while, say, in economics journals, the appendices extend to hundreds of pages.
Would love to hear if other fields' journals are experiencing a similar pressure in not only at the extensive margin (no of new submission) but the intensive margin (effort needed to check each work).
`simulacrum` is a great word, gotta add that to my vocabulary.
Verifying the correctness of solutions is often much easier than finding correct solutions yourself. Examples: Sudoku and most practical problems in just about any field.
-
"The training doesn't evaluate 'is the answer true' or "is the answer useful.'"
Lets pretend RLVF does not exist to give this argument a chance. Then, while the training loop does not validate accuracy directly I guess, the meta-training loop still does. When someone prompts a model, the resulting execution trace shows if the generated answer is correct or not, and this trace is kept for subsequent training runs. The way coding agents are used productively is not: a) generate code with AI and b) run it yourself; its a) ask the AI to do something, including generating the code and running it too, no step b. This naturally creates large training sets of correct and incorrect solutions.
-
"We spent billions to create systems used to perform a simulacrum of work."
Have you even tried using these systems to produce valuable work? How could this possibly be your conclusion after having tried them?
Why is that not an embarrassment for everyone who moans and carps and complains about the craft?
I can see a similar problem with this article - the author notices that LLMs produce a lot of errors - then concludes that they are useless and produce only simulacrum of work. The author has an interesting observation about how llms disrupt the way we judge knowledge work. But when he concludes that llms do only simulacrum of work - this is where his arguments fail.
Wait, you're probably talking about the test of discarding a report based on something superficial like spelling errors. Which fails with LLMs due to their basic conman personalities and smooth talking. And therefore ..?
This is not true as stated. I'd try to gloss over the absolutes relative to the context, but if I'm totally honest, I'm not sure I understand what idea you're trying to communicate.