Top
Best
New

Posted by embedding-shape 1/27/2026

Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC(emsh.cat)
Related: https://simonwillison.net/2026/Jan/27/one-human-one-agent-on...
322 points | 153 commentspage 2
jFriedensreich 1/28/2026|
My community and me are waiting for browserBench for a while now and happy to see it finally starting. Browsers are arguably one of the most complex and foundational piece of software, the ability to create something like this from scratch will be an important evaluation as limits of what is possible are harder and harder to find.
rahimnathwani 1/27/2026||
This is awesome. Would you be willing to share more about your prompts? I'm particularly interested in how you prompted it to get the first few things working.
embedding-shape 1/27/2026|
Yes, I'm currently putting it all together and will make it public via the blog post. Just need to go through all of it first to ensure nothing secret/private leaks, will update once I've made it public.
dvrp 1/28/2026||
It's interesting to think that—independently of what you think of Cursor's browser implementation being truly "from scratch" or not—the fact that people are implementing browsers from scratch with agents happened because of Cursor's post. In other words, in a twisted and funny way, this browser exists because of Cursor's agent.

This is how we should be thinking about AI safety!

embedding-shape 1/28/2026|
I mean I wanted to demonstrate further how wrong and misleading I think their initial blog post was so yeah, I made this because of what they said and marketed :)
pulkas 1/27/2026||
The Mythical Man-Month, revisited
lelele 1/28/2026|
What do you mean?
embedding-shape 1/28/2026||
I think it means they'd like to have a baby with me, and the more agents we can add, the faster the baby can incubate. Usual stuff :)
hedgehog 1/27/2026||
This looks pretty solid. I think you can make this process more efficient by decomposing the problem into layers that are more easily testable, e.g. testing topological relationships of DOM elements after parse, then spatial after layout, then eventually pixels on things like ACID2 or whatever the modern equivalent is. The models can often come up with tests more accurately than they get the code right the first time. There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.
embedding-shape 1/27/2026||
> There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.

That's really interesting and sounds useful! I'm wondering if there are general guidelines/requirements (not specific to browsers) that could kind of "trigger" those things in the agent, without explicitly telling it. I think generally that's how I try to approach prompting.

hedgehog 1/28/2026||
I think if you explain that general idea the models can figure it enough to write into an implementation plan, at least some of the time. Interesting problem though.
embedding-shape 1/28/2026||
> that general idea the models can figure it enough to write into an implementation plan

I'm not having much luck with it, they get lost in their own designs/architectures all the time, even the best models (as far as I've tested stuff). But as long as I drive the design, things don't end up in a ball of spaghetti immediately.

Still trying to figure out better ways of doing that, feels like we need to focus on tooling that lets us collaborate with LLMs better, rather than trying to replace things with LLMs.

hedgehog 1/28/2026||
Yeah, from what I can tell a lot of design ability is somewhere in the weights but the models don't regurgitate it without some coaxing. It may be related to the pattern where after generating some code you can instruct a model review it for correctness and it can find and fix many issues. Regarding tooling, there's a major philosophical divide between LLM maximalists that prefer the model to drive the "agentic" outer loop and what I'll call "traditionalists" that prefer control be run by algorithms more related to classical AI research. My personal suspicion is the second branch is greatly under-exploited but time will tell.
socalgal2 1/28/2026||
the modern equivalent is the Web Platform Tests

https://web-platform-tests.org/

hedgehog 1/28/2026||
Amazing. I think if I were taking on the build-a-browser project I would pair that with the WhatWG HTML spec to come up with a task list (based on the spec line-by-line) linked to specific tests associated with each task. Then of course need an overall architecture and behavioral spec for how the browser part behaves beyond just rendering. A developer steering process full time might be able to get within 80% parity of existing browsers in a month. It would be an interesting experiment.
embedding-shape 1/28/2026||
> I would pair that with the WhatWG HTML spec

I placed some specifications + WPT into the repository the agent had access to! https://github.com/embedding-shapes/one-agent-one-browser/tr...

But judging by the session logs, it doesn't seem like the agent saw them, I never pointed it there, and seems none of the searches returned anything from there.

I'm slightly curious in doing it from scratch again, but this time explicitly point it to the specifications, and see if it gets better or worse.

rvz 1/27/2026||
> I'm going to upgrade my prediction for 2029: I think we're going to get a production-grade web browser built by a small team using AI assistance by then.

That is Ladybird Browser if that was not already obvious.

dewey 1/27/2026||
For the curious, they have a reasonable AI policy:

https://github.com/LadybirdBrowser/ladybird/blob/master/CONT...

simonw 1/27/2026||
Ladybird (a project I deeply respect) had a several year head start.
micimize 1/28/2026||
An obvious nice thing here compared to the cursor post is the human involvement gives some minimum threshold confidence that the writer of the post has actually verified the claims they've made :^) Illustrates how human comprehension is itself a valuable "artifact" we won't soon be able to write off.

My comment on the cursor post for context: https://news.ycombinator.com/item?id=46625491

avmich 1/27/2026||
Next thing would probably be an OS. With different APIs, the browser could be not constrained by existing standards. Generation of a good set of applications making working in the OS convenient - starting with GNU set? And then we can approach CPU architecture - again, without constraint to existing languages or instruction sets. That should be interesting to play with.
forgotpwd16 1/27/2026|
Someone has already done this: https://github.com/viralcode/vib-OS

Also, someone made a similar comment not too long ago. So people surely are curious if this is possible. Kinda surprised this project's submission didn't got popular.

forgotpwd16 1/27/2026|
Impressive. Very nice. (Let's see Paul Allen's browser. /s) Can say is Brooks's law in action. What one human and one agent can do in 3d, one human and hundreds of agents can do in few weeks. A modern retake of the old joke.

>without using any 3rd party libraries

Seems to be an easier for coding agents to implement from scratch over using libraries.

More comments...