Posted by sambellll 1 day ago
https://mathpix.com/careers/apply
Then internally we have dashboards and sorting based on AI agent scoring. I noticed the scoring is imperfect but still saves a lot of time. Candidates scored at or below 2/5 are reliably bad and candidates above 4/5 are consistently impressive and leave thoughtful answers.
The biggest thing is not using resumes. You can’t reliably gage applicants without a writing sample and resumes are the worst form of writing sample. Also you need to be intentional about who you’re hiring for, both to craft the questions as well as grade the responses.
There's no better feeling than building something open source and watching it take off. Nine months ago, I built a simple hiring agent to solve one very real problem.
Things it is not: It's not an ATS. We don't use it to screen our open roles. Our customers don't use it either.
Here's what it is: Every year at HackerRank, we get 50,000 to 60,000 intern applications. No human can read that many resumes well. So I built something to rank them, helping me decide which resumes to read first.
[This was before we built AI Interviewer (Chakra) to automate the first round of interviews, so candidates are no longer rejected based on their resumes alone.]
Two things worth clarifying since I've seen them come up in this thread:
The default model is gemma3:4b because it's what runs locally on most laptops - no cloud API needed. Actual resumes are evaluated using a top Gemini model. The repo ships with a demo config, not the production one.
The cutoff score was set very low — the system was designed to rank resumes, not reject them. Only resumes at the very bottom of the distribution were filtered out. The vast majority passed through to human review, where the real decisions were made.
Over the last week, it's taken on a life of its own. People are cloning it, running their own resumes through it, opening issues, sending PRs.
I contributed to open source a lot in college. Somewhere along the way, I drifted away from it. This week reminded me how good that feeling is. This thread has also given me more ideas than I expected. The critiques here are sharp and I'm already thinking about how to act on them. Improvements are coming.
I'm not sure if your intent was to come across as having written this yourself, but it did not have the effect of improving my perception that this approach is flawed.
I was also disappointed that you didn't address the variability in scores. I'm inferring that you believe the larger model takes care of the main observation in the post, but I don't really see you directly addressing the points.
Maybe it's just me.
Reading this thread, I'm hoping to minimize the variability even further (even though I know it can't be fully removed).
Or are you using it to screen? I'm confused.
Rest of the ones with good scores (at least more than 40K), was reviewed manually.
>>No human can read that many resumes well. So I built something to rank them, helping me decide which resumes to read first
Translation: it's an ATS.
>>the system was designed to rank resumes, not reject them
>>Only resumes at the very bottom of the distribution were filtered out
Translation: it was designed to reject the CVs