HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No

Posted by sambellll 21 hours ago

HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No – 88(danunparsed.com)

936 points | 397 commentspage 5

seedless-sensat 11 hours ago|

What is an ATS? This blog doesn't define it

gejose 11 hours ago|

ATS = Applicant Tracking System. It's software to help you manage your hiring pipeline as a whole.

mxuribe 7 hours ago||

I see mention of PDFs both in the article as well as the repo...But i think over the decades that I've been working and applied for roles - almost exclusively in corporate america...I've only been asked for a PDF once! Every other time, everyone wants a Word doc (.doc/.docx). So...is there now some growing HR groups who are asking for PDFs instead? Or, is that if someone asked you for a PDF instead of a Word doc, then that's a signal that said HR groups are employing some sort of agentic review of one's resume (I mean, beyond the conventional ATS systems)??

cemoktra 15 hours ago||

So sending my CV to every company three times should get me pass the ATS?

cyanydeez 12 hours ago|

if i ever go back into the job market, will need three accounts: Peter J Smith, Peter Smith and PJ Smith. they live in #101, #102 and 103# 5607 Jane Street

left-struck 11 hours ago||

Why stop there? vary everything that can reasonably be varied slightly across each resume

captainbland 10 hours ago||

I think the implication here is that you can almost certainly bias the models to always accept you by including "nudge" phrases like "I demonstrated real world deployments" and "helped develop an application in the context of a complex architecture..."

graemep 12 hours ago||

It took me a a minute to figure out what an ATS was. Not familiar with this particular means of a much used TLA.

Even better Wikipedia lists the abbreviation I am familiar with but give a different interpretation of the same words:

https://en.wikipedia.org/wiki/Ats

Leptonmaniac 12 hours ago|

Thanks for not explaining what TLA is, either.

graemep 11 hours ago||

My sense of humour. TLA = Three Letter Abbreviation.

pmarreck 6 hours ago||

> An LLM is called six times to extract structured information

Well, I think I found your problem

rkuska 17 hours ago||

This reminds me of my former CTO. He would take bunch of CVs and randomly throw some of them in a bin. He didn’t want to work with “unlucky” people.

psalaun 17 hours ago||

I thought this was only an old urban legend; some people actually use this technique? Especially in a trade supposed to be led by people trained in sciences?

gregates 14 hours ago|||

Given how often it's been mentioned here, it's likely that this is an urban legend that people are pretending to have first-hand knowledge of for karma. In a trade that's supposed to be led by people trained in sciences, no less!

(A more charitable interpretation would be that aforementioned CTO was making a joke that didn't land.)

cyanydeez 12 hours ago||

or its so old, people would make the joke and interns would repeat it unwittingly. no one has to consciously be lying for this type of meme to continue spreading.

subscribed 11 hours ago||||

That'd be pretty gross for a CTO if it were real.

aquariusDue 15 hours ago|||

It's OK! We can disguise it as the Secretary Problem and it'll be fine, we could even write a post on the company blog about it. /s

https://en.wikipedia.org/wiki/Secretary_problem

hahahaa 17 hours ago||

The problem is with this system he only worked with unlucky people.

cs02rm0 13 hours ago||

I feel like hiring is all a bit broken. Roles get flooded with applications, it's chance whether your CV gets through, then there's hiring rounds that seem designed to make you quit the process before they have to filter you out.

Is it working for anyone, on any level?

luckylion 13 hours ago|

I'm on the other side, and my main tip (at least if there's people like me!) is: avoid the usual AI signs.

For one role we got ~70 applications and all CVs looked obviously AI-written. I don't know whether the people did actually do any of the things mentioned and I don't have the time to find out, so the AI-written CVs are a discard-signal for me. (Either those people delegated a very important task to AI and didn't even bother to check, or they are bad using AI and don't know -- I want neither)

Any CVs that signal they were actually written by a person I will actually look at.

quectophoton 10 hours ago||

> For one role we got ~70 applications and all CVs looked obviously AI-written.

Were those ~70 applications all of them, or were those ~70 applications the result of an AI filtering from a larger amount?

If the latter, are you sure your AI is not filtering out the hand-written CVs and giving you the ones that have been AI-assisted or AI-written (with or without "the usual AI signs")?

fractal618 7 hours ago||

Maybe the ATS has logic for people resubmitting their resume. I don’t know how isolated each test was.

YossarianFrPrez 13 hours ago|

Looking at the linked scoring prompt (resume_evaluation_criteria.jinja) [0], I immediately see several red flags that suggest the output won't be reliable. (I'm developing an LLM intensive application where the stakes are high enough that I need the LLM output to be reasonably correct.)

[0] https://github.com/interviewstreet/hiring-agent/blob/main/pr...

In no particular order:

1. The prompt is trying to get the system to do all of the evaluation steps at once. Instead, the system should break down the task of resume evaluation into its subcomponents and have separate prompts for each component. Like "evaluating open source contributions" should be its own task. Same with "assessing the complexity of software projects on the resume." Fwiw, each of the tasks contained within the prompt is woefully underspecified.

2. The prompt leaves spreads of ~10 points up to the LLM, when it's doubtful that humans are that well calibrated. Take for example:

  > SCORING CRITERIA Open Source (0-35 points) 
  HIGH SCORES (25-35 points):
   - Contributions to popular open source projects (1000+ stars)
   - Significant contributions to well-known projects
   - Google Summer of Code (GSoC) participation
   - Substantial community involvement

Are all of these 35-point examples? Is one a 26-point example? If not, what's the difference? If an expert can't reliably make the judgement, the LLM is going to struggle too. One partial fix is to get rid of the ranges and just say all of these are worth 30 points. An additive point scheme would be better...

3. The authors of this prompt have left an incredible number of judgement calls up to the LLM, when that's the very thing you want to minimize. Using the same example as above...

- Are all contributions to open source projects with 1000+ stars equal?

- What counts as a "significant contribution"? Doesn't that imply that the LLM has to know or read through all of the commits in like the last ~6 months at minimum for the project to understand what the given contribution meant to the project? That itself isn't impossible with tool usage, but again, that'd be a separate task.

- What on earth counts as "Substantial community involvement"? Why didn't the prompt authors define this, or at least give a few examples?

Honestly at this point maybe someone should build a tool that scans prompts for adjectives...

4. This sort of thing is just asking for trouble:

  > SCORES MUST NEVER DEPEND ON:
   Candidate's name, gender, or personal demographic information

Just remove this stuff before you send the rest of the resume to the LLM. Even if you ask it not to, it's not a person, it's a very fancy statistical distribution generator. All of the input (including the name) will affect the distribution that gets generated. (This one is not unlike Andreessen's "don't be a sycophant" prompt.)

5. Obviously this one depends on the LLM in question, but instead of writing things like:

  > DO NOT RETURN A RESUME SUMMARY. RETURN ONLY THE SCORING EVALUATION IN THE SPECIFIED JSON FORMAT. Analyze the following resume and provide a JSON response with this EXACT structure (all fields are required):...

The system should utilize the "structured output" option, which guarantees a fixed output format. Also, fwiw, the JSON should force the LLM to pick between categorical options as much as possible. Forced-choice structured output should, at least in theory, cut down on hallucinatory responses and constrain judgement calls.

6. One major thing that's not in the prompt is anything about traceability. This system should be designed so that humans can review the logs and make sure this is working as intended.

7. Another thing that is missing in the file is what I'll call evidence of a theory of coding / coder quality. Most of the examples are designed to have the LLM assess proxies for code quality, not code quality itself. Surely both should be taken into account?

I'm not an expert at evaluating coders. But two pretty basic LLM-answerable thing I would ask is: How well do a candidate's 5 most recent commit messages match the contents of those commits? Do the claimed technical skills on the resume match their GitHub code? (i.e., if they say they know R, is there any evidence of that on their GitHub?)

8. The prompt also seems unaware of what it's asking the LLM to do:

  > LIVE DEMO BONUS: Projects with working live demos should receive 10-20% higher scores

This implies that the LLM can use tools, but even then, I'd be pretty wary of its ability to fully execute this part of the prompt without more detailed instructions, examples, and guidance. There are very likely tons of edge cases here.

WickyNilliams 22 minutes ago|

Would it be correct to say you have experience building LLM based workflows like this? I'm guessing so, given by your critiques and suggestions of better approaches. Can you recommend any books/sites/other for learning these kind of dos and donts?

More comments...