Posted by sambellll 18 hours ago
But logical inference itself is limited. You still have to find out if p is true or not - the ground truth.
How do you find that? You would be able to define in the prompt that if resume has p, infer q and do this. But determining the truth value of p is something LLM cannot do.
It’s not a limitation of the LLM. It’s the limitation of logic itself. You take 10 humans and give them the resumes with the same rubrics as the LLM. You’ll get a similar range of scores because everyone would assign different values.
The issue is not in logical inference. It’s in determining the value of p, which takes much more than logic. And current LLMs are limited to being logical.
In my experience, cold-applying has always worked essentially as a black hole, and LLMs haven't changed that much. The reality is that alternative avenues are always necessary to get the job you want. That could be a third-party recruiter; reaching out to a hiring manager on LinkedIn; or using your network to get referrals. Those continue to work whether the company is using a bone-headed tool like this or not.
Well done you! It is difficult to avoid architectural complexity, but imho well worth it.
Which sort of sounds workable until you scale it up to larger datasets, where at some point compute/time/energy costs will render it non-viable.
I am sure there’s some reasonable rule of thumb estimation on distribution that could be applied based off fewer runs per data artifact, but you’re always going to be trading off against confidence by doing this.
Beyond this, I’d bet that almost no implemented systems that use LLMs for scoring, ranking, or decision making use such a multi-run approach. Partly because people don’t understand their behaviour is stochastic, perhaps because a lot of people without a background in statistics don’t understand what stochastic actually means, and no doubt partly because of budget concerns: if you have to ask an LLM to do the same thing 10, 50, 100 times to get a sufficiently good result, then the cost saving argument is either weakened or completely destroyed.
There is at least one more aspect worth considering in the specific case of resumes/CVs: is the inconsistency of scoring by LLM worse than the inconsistency of scoring by a human following a similar process?
Because the reality is that, even for an experienced recruiter, reviewing hundreds or thousands of resumes or CVs gets pretty fatiguing. People get hungry, bored, tired, restless, irritable, etc.
That inevitably leads to inconsistencies creeping in, so there’s always an element of “luck” (or, perhaps better, uncertainty) as to whether your resume/CV passes screening.
So is that inconsistency better or worse with LLM screening? I don’t know. But, at least, if it’s not worse maybe it doesn’t matter for this specific use case. And if it’s notably better then maybe it’s raised the bar on what “good enough” screening looks like?
(And I’m sure other use cases warrant similar, “does it matter?”, questions, with the answers no doubt landing differently.)
I am not currently looking for employment, nor am I currently particularly worried about future prospects if I was suddenly in the position of looking for employment.
But if I ended up in a position with nothing to lean on but scattering my CV everywhere, well…
A lot of my major contributions are littered across the internet, private, or even just verbal/consultancy. They're things I did for free, in my spare time.
I also avoid GitHub. If you just look at my GitHub page for extra context, you would likely miss that delivering that very GitHub page likely involved a few bits of code I wrote.
Now, I could do a better job of trying to document this stuff, so it could be easier to find… But also I can't quite imagine how that would work.
> 30 for personal projects
These are insane weights for scoring a software engineer's resume.
This isn't to diminish the whispernet. Rather, it shows just how many important signals cannot be quantized.
The only drawback I see is that you should compare every pair of CVs for best results, and that grows quadraticly with number of CVs. Of course you can settle for fewer comparisons and not perfect results. But then I'm not sure if you can hit a good ratio of quality and token spend.
1. Set the elo of all CVs to 1000 elo
2. Randomly pair up CVs and compare. Winners gain elo, losers lose elo.
3. Repeat #2 for a few iterations, then remove bottom X% of CVs.
4. Repeat 2-3 until the amount of remaining CVs is small enough to do an exhaustive comparison.
I don't have a mathematical proof, but I suspect that this is a decent cost-effective approximation of comparing every pair (depending on the parameters)
Or compare each one to a reference set? Take 5 resumes of existing employees, rank all candidates against that set, maybe you get some useful level prediction into the bargain