HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No

Posted by sambellll 1 day ago

HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No – 88(danunparsed.com)

970 points | 406 commentspage 11

myshapeprotocol 13 hours ago|

[dead]

secrooq 19 hours ago||

[flagged]

mlpicker 1 day ago||

[flagged]

chonghaoju 1 day ago||

[dead]

mv_d5339e31 12 hours ago||

[dead]

hari_vardhan 23 hours ago||

[dead]

tesnorindian 21 hours ago||

[dead]

CurbStomper 17 hours ago||

[flagged]

nicodjimenez 19 hours ago||

I actually just built an ATS for my company Mathpix. But it never occurred to me to use resumes. Basically we have a set of company values and a specific open ended questionnaire to gauge the fit:

https://mathpix.com/careers/apply

Then internally we have dashboards and sorting based on AI agent scoring. I noticed the scoring is imperfect but still saves a lot of time. Candidates scored at or below 2/5 are reliably bad and candidates above 4/5 are consistently impressive and leave thoughtful answers.

The biggest thing is not using resumes. You can’t reliably gage applicants without a writing sample and resumes are the worst form of writing sample. Also you need to be intentional about who you’re hiring for, both to craft the questions as well as grade the responses.

mdorazio 18 hours ago|

This seems likely to be worse. How do you screen out people who point an LLM at your values and ask it to answer your questions in a way likely to appeal to a recruiter using an LLM to score the responses?

sp2hari 18 hours ago|

HackerRank CTO & author of this repo here

There's no better feeling than building something open source and watching it take off. Nine months ago, I built a simple hiring agent to solve one very real problem.

Things it is not: It's not an ATS. We don't use it to screen our open roles. Our customers don't use it either.

Here's what it is: Every year at HackerRank, we get 50,000 to 60,000 intern applications. No human can read that many resumes well. So I built something to rank them, helping me decide which resumes to read first.

[This was before we built AI Interviewer (Chakra) to automate the first round of interviews, so candidates are no longer rejected based on their resumes alone.]

Two things worth clarifying since I've seen them come up in this thread:

The default model is gemma3:4b because it's what runs locally on most laptops - no cloud API needed. Actual resumes are evaluated using a top Gemini model. The repo ships with a demo config, not the production one.

The cutoff score was set very low — the system was designed to rank resumes, not reject them. Only resumes at the very bottom of the distribution were filtered out. The vast majority passed through to human review, where the real decisions were made.

Over the last week, it's taken on a life of its own. People are cloning it, running their own resumes through it, opening issues, sending PRs.

I contributed to open source a lot in college. Somewhere along the way, I drifted away from it. This week reminded me how good that feeling is. This thread has also given me more ideas than I expected. The critiques here are sharp and I'm already thinking about how to act on them. Improvements are coming.

orsorna 18 hours ago||

You know you're not writing for LinkedIn? So platitudes about drifting away, watching your project "succeed" by being really popular, is not relevant to the main concerns pushed by this piece. Particularly brushing off the non deterministic score calculation.

beardedwizard 18 hours ago|||

I'm a bit disappointed to see "The critiques here are sharp", a Claude tell, in a response which (to me) is trying to subtly argue that hackerrank is not overly reliant on LLMs.

I'm not sure if your intent was to come across as having written this yourself, but it did not have the effect of improving my perception that this approach is flawed.

I was also disappointed that you didn't address the variability in scores. I'm inferring that you believe the larger model takes care of the main observation in the post, but I don't really see you directly addressing the points.

Maybe it's just me.

sp2hari 18 hours ago||

There is variability in scores and that's expected given we are eventually using a LLM to score. At least, when I used it 7 months ago, the only way I could avoid it was by keeping the cutoff score low (as low as 10 or 20).

Reading this thread, I'm hoping to minimize the variability even further (even though I know it can't be fully removed).

rendaw 18 hours ago|||

Do you read all ~50,000 then? Just with the ranked ones first?

Or are you using it to screen? I'm confused.

sp2hari 15 hours ago||

There are some with very low scores that were ignored (like < 20).

Rest of the ones with good scores (at least more than 40K), was reviewed manually.

DiskoHexyl 13 hours ago|||

>>It 's not an ATS.

>>No human can read that many resumes well. So I built something to rank them, helping me decide which resumes to read first

Translation: it's an ATS.

>>the system was designed to rank resumes, not reject them

>>Only resumes at the very bottom of the distribution were filtered out

Translation: it was designed to reject the CVs

jkhdigital 18 hours ago|||

Saw this comment at the top with 0 replies and thought “How is that possible??” and then saw the “0 minutes ago” timestamp. Only on HN can you stumble into the comments section just moments after a CTO, founder, author, etc. left unfiltered remarks about the exact topic of the post. Never change HN.

lewispollard 18 hours ago||

Depends how "unfiltered" you consider LLM output to be.

rizsyed1 18 hours ago||

Thank you for your fantastic work!