Top
Best
New

Posted by thehappyfellow 11 hours ago

Simulacrum of Knowledge Work(blog.happyfellow.dev)
108 points | 40 commentspage 2
happytoexplain 5 hours ago|
"They sound very confident," was a warning a gave a lot on a project a year ago, before I gave up trying to get developers to stop blindly trusting the output and submitting things that were just wrong. The documentation of that team went to absolute shit because the developers thought LLMs magically knew everything.
throwaway_sydn 3 hours ago||
“/reliable-resources-skill Claude, using the list of approved resources, evaluate the report I’m attaching”
rowanG077 7 hours ago||
I don't really agree with the premise of the article. Sure proxy measures are everywhere. But for knowledge work specifically you can usually check real quality. Of course it's not as extremely easy as "oh this report contains a few spelling errors", but it is doable. If you accepted work purely based on superficial proxy measures you were not fairly evaluating work at all.
zingar 7 hours ago|
I think there’s a weaker claim that holds true: we were able to ignore lots of content based on the superficial (and pay proper attention to work that passed this test) and now we are overwhelmed because everything meets the superficial criteria and we can’t pay proper attention to all of it.
thehappyfellow 7 hours ago|||
That's what I had in mind! The whole post is a claim that evaluating knowledge work got more expensive because cheaper measures stopped correlating well with quality.

If someone was already evaluating the work output using a metric closer to the underlying quality then it might not have been a big shift for them (other than having much more work to evaluate).

rowanG077 7 hours ago|||
Yes, I agree that this is true!

You could however only do that if you were fine with unfairly judging the quality of work, as you now readily discarded quality work based on superficial proxies. Which admittedly is done in a lot of cases.

mrtesthah 7 hours ago||
>"is the RLHF judge happy with the answer."

Reinforcement Learning with Verifiable Rewards (RLVR) to improve math and coding success rates seems like an exception.

balamatom 8 hours ago||
>We've automated ourselves into Goodhart's law.

Yes.

This does not however mean that progress is not being made.

It just means the progress is happening along such dimensions that are completely illegible in terms of the culture of the early XXI century Internet, which is to say in terms of the values of the society which produced it.

downboots 7 hours ago|
Feels like a parallel with https://en.wikipedia.org/wiki/Constructivism_%28philosophy_o... where "it's not valid until you checked"
balamatom 7 hours ago||
I didn't see the connection initially.
simianwords 7 hours ago||
The FUD about LLM's will never get old. The way I know and trust LLM's is the same way a manager would trust their reportees to do good work.

For most tasks, the complexity/time required to verify a task is << the time required to do the task itself. Sure there can be hallucinations on the graph that the LLM made. But LLMs are hallucinating much less than before. And the time to verify is much lower than the time required for a human to do the task.

I wrote a post detailing this argument https://simianwords.bearblog.dev/the-generation-vs-verificat...

JackSlateur 6 hours ago||
FUD ? You are missing the point entierly, and so does your blog post

Are LLM a good dictionary of synonyms ? Perhaps, but is it relevant ? Not at all

Are you biased when a solution is presented to you ? Yes, like all humans.

Is it damageful when said solution is brain-dead ? Obsiously.

Are you failing to understand that most (if not all) manager's work is human centric and, as such, cannot be applied to a non-human ? Obviously ..

You trust a machine's intent. Joke's on you, it has no intent at all, it will breaking that "trust" your pour in it without even realizing-it

You say that LLM does better job than you. Perhaps this says it all ?

doggers246 4 hours ago||
[dead]
larrytheworm 5 hours ago||
[dead]
jdw64 7 hours ago|
[dead]