Posted by david927 11/9/2025
Ask HN: What Are You Working On? (Nov 2025)
The usual approach to coding tasks doesn’t work anymore - companies are looking for AI engineers, yet it’s still unclear how to assess AI proficiency.
Our goal is to design challenges that combine prompting + coding, allowing us to score both how well a candidate prompts and how good the resulting code is. The aim is to bring measurement to AI prompting skills - how well-aligned prompts are and how candidates handle LLM-generated code.
At the same time, we want to keep a strong human balance in the process: hiring is a two-way street, and screening shouldn’t be fully offloaded to AI. We’re human-first.
Several tasks are already live - you can try them here: https://valuate.dev
Cupcake is a governance/policy-enforcement layer for agents. Its innovation is binding OPA/rego to agent runtimes (via hooks).
I do not believe we will every strictly rely on "better" models in the wild without deterministic guarantees or ways for enterprises to factor in their own alignment - system prompts dont cut it.
https://github.com/eqtylab/cupcake
Stay tuned for the formal release here in a couple of weeks.
[1] https://github.com/anthropics/claude-code/issues/712
Cupcake GitHub: https://github.com/eqtylab/cupcake
My other project is https://eggexplorer.com This is a site I wish I had when building out my flock of chickens. It allows you to see the different characteristics of chickens and which hatcheries sell each different breed. You can also see which hatcheries sell hatching eggs for each breed as well.
A lightweight infographic generator (Gemini API → structured layout → export to PDF).
An AI marketing content tool that takes a topic and outputs research + themed HTML + a printable PDF.
Cleaning up the docs/structure for Schema Scanner, an open-source tool that scans websites and generates Schema.org markup.
Exploring a simple AI search visibility tracker to see how often a brand shows up in ChatGPT / Perplexity / Gemini responses.
Still early, mostly building to understand what’s useful vs. noise, but the patterns have been interesting.
Of relevance - my review of Code Complete: https://www.youtube.com/watch?v=QlY0EGWp7rw
Fair warning to any others: It's mostly fantasy and sci-fi with the occasional tech book thrown in
- Local-first app for comparing hardware builds, down to the individual component feature level: specs, benchmarks, even cpu extension support, lanes, how many speakers in X laptop, dolby atmos? screen panel manufacturer(s), etc. Basically, no-nonsense real product comparison for transparent and fast decisions.
A lot of cybersecurity attacks happen because of stolen credentials. One big example is the supply chain attack, Shai Hulud. In a lot of enterprises, credential sprawl is a huge issue and figuring out who (people, services, ai agents) has access to to what systems is a paramount task.
At https://gearsec.io, we are building a platform where accesses are created via policies. The result is that, the enterprise doesnt deal with credentials anymore. They only need to define policies and nothing more.
I would love to know if you faced this problem and how are you solving them at your workplace!
I have worked as dev in many different constellations of the years, and seen many teams choose between bad options like delay feature launches for manual translations, ship incomplete translations and promise "we'll translate it later," or lately use ChatGPT/LLMs that lose consistency/context and require coordination.
Localhero starts from the premise that translations are part of CI. New strings get translated automatically in GitHub Actions, with glossaries and style guides so it sounds like your product and not generic AI output.
Goal is to help product teams ship localized features without all the coordination/delay.
https://donethat.ai Passively processing screenshots is obviously pretty sensitive, it has an option to bring your own (local or remote) LLM, otherwise I process with gemini and never store any data.
It's in beta right now so if you want to try it you have to enable "proactive chat" in settings.
I also made a list of similar tools out there: https://donethat.ai/compare
The idea is controversial as proved by the resistance to Recall.
Morrissey should be your pet (stop me if you think you've heard this one before)
It should start doing what Instagram is doing and transform your activity into visual feedback that is fun to explore by you or the people that you want to show to, what you did throughout the day.