Posted by embedding-shape 1/27/2026
Here's my own screenshot of it rendering my blog - https://bsky.app/profile/simonwillison.net/post/3mdg2oo6bms2... - it handles the layout and CSS gradiants really well, renders the SVG feed icon but fails to render a PNG image.
I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!
What it tells me is that "effectively using agents" can be much more important than just throwing tokens at a problem and see what comes out. I myself have completely deleted several small vibe-coded projects without even going over the code, because what often happens is that, two days after the code is generated, I realize that I was solving the wrong problem or using the wrong approach.
A coding agent doesn't care. It most likely just does whatever you ask it to do with no pushback. While in some cases it's worth using them to validate an idea, often you dig a deeper hole for yourself if you go down a wrong path in the first place.
Amplifiers, rather than replacements. I think the community at large still thinks LLMs and agents are gonna be "replacing" knowledge, which I think is far from the truth.
I agree however on the point that no prior software engineering skills would make this much more difficult.
So the first day or two, each change takes 20-30 minutes. Next day it takes 30-40 minutes per change, next day up to an hour and so on, as the requirements start to interact with each other, together with the ball of spaghetti they've composed and are now trying to change without breaking other parts.
Contrast that with when you really own the code and design, then you can keep going for weeks, all changes take 20-30 minutes, as at day one. But also means I'm paying attention to what's going on, so no vibe-coding, but pair programming with LLMs, and also requires you to understand both the domain, what you're actually aiming for and the basics of design/architecture.
I built other things too which would not be considered trivial or "simple", or as you say they're architecturally complex, and they involve very domain specific knowledge about programming languages, compilers, ASTs, databases, high-performance optimizations, etc. And for a long time, or shall I say never, have I felt this productive tbh. If I were to setup a company around this, which I believe I could, in pre-LLM era I'd quite literally have to hire 3-5 experienced engineers with sufficient domain expertise to build this together with me - and I mean not in every possible potential but the concrete work I've done in ~2 weeks.
I feel like you have missed emsh's point which is that AI agents significantly become muddled up if your project's complex.
I feel the same way personally. If I don't know how the AI code interacts with each other, I feel a frustration as long as the project continues precisely because of the fact that they mention about first taking less time and then taking longer and longer time having errors which it missed etc.
I personally vibe code projects too but I will admit that there is this error.
I have this feeling that anything really complex will fall heels first if complexity really grows a lot or you don't unclog the slop.
This is also why we are seeing "AI slop janitors" humans whose task is to unsloppify the slop.
Personally I have this intution that AI will create really good small products, there is no denying in that, but those were already un-monetizable or if they were, then even in the past, they were really easy to replicate, this probably just lowered the friction
Now if your project is osmething commercial and large, I don't know how much AI slop can people trust. At some point if people depend on your project which is having these issues because people can understand if the project's AI generated or not, then that would have it issues too.
And I am speaking this from experience after building something like whmcs in golang in AI. At first, I am surprised and I feel as if its good enough for my own personal use case (gvisor) and maybe some really small providers. But when I want it to say hook to proxmox, have the tmate server be connected with an api to allow re-opening easier, have the idea of live migration from one box to another etc., create drivers for the custom firecrackers-ssh idea that I implemented once again using AI.
One can realize how quickly complexity adds in projects and how as emsh's points out that it becomes exponentially harder to use AI.
I have one project Claude is working on right now where I'm testing a setup to attempt to take myself more out of the loop, because that is the hard part. It's "easy" to get an agent to multiply your output. It's hard to make that scale with your willingness to spend on tokens rather than with your ability to read and review and direct.
I've ended up with roughly this (it's nothing particularly special):
- Runs a evaluator that evaluates the current state and assigns scores across multiple metrics.
- If a given score is above a given threshold, expand the test suite automatically.
- If the score is below a given threshold, spawn a "research agent" that investgates why the scores don't meet expectations.
- The research agent delivers a report, that is passed to an implementation agent.
- The main agent re-runs the scoring, and if it doesn't show an improvement on one or more of the metrics, the commit is discarded, and notes made of what was tried, and why it failed.
It takes a bit of trial and error to get it right (e.g. "it's the test suite that is wrong" came up early, and the main agent was almost talked into revising the test suite to remove the "problematic" tests) but a division sort of like this lets Claude do more sensible stuff for me. Throwing away commits feels drastic - an option is to let it run a little cycle of commit -> evaluate -> redo a few times before the final judgement, maybe - but it so far it feels like it'll scale better. Less crap makes it into the project.
And I think this will work better than to treat these agents as if they are developers whose output costs 100x as much.
Code so cheap it is disposable should change the workflows.
So while I agree this is a better demonstration of a good way to build a browser, it's a less interesting demonstration as well. Now that we've seen people show that something like FastRender is possible, expect people to experiment with similarly ambitious projects but with more thought put into scoring/evaluation, including on code size and dependencies.
Just the day(s) before, I was thinking about this too, and I think what will make the biggest difference is humans who posses "Good Taste". I wrote a bunch about it here: https://emsh.cat/good-taste/
I think the ending is most apt, and where I think we're going wrong right now:
> I feel like we're building the wrong things. The whole vibe right now is "replace the human part" instead of "make better tools for the human part". I don't want a machine that replaces my taste, I want tools that help me use my taste better; see the cut faster, compare directions, compare architectural choices, find where I've missed things, catch when we're going into generics, and help me make sharper intentional choices.
But for other projects, being able to scale with little or no human involvement suddenly turns some things that were borderline profitable or not possible to make profitable at all with current salaries vs. token costs into viable businesses.
Where it works, it's a paradigm shift - for both good and bad.
So it depends what you're trying to solve for. I have projects in both categories.
The one that people couldn't compile, and was largely a failed attempt to stitch together existing libraries?
It's great to see hackernews be so core part of it haha.
> I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!
I do wonder if tech people from future/present are gonna witness this as a goliath vs david story. 20k 1 human 1 agent beats 5 million$ 1.6 millions loc browser changing how even the massive AI users/pioneers at the time thought about the use of AI
Looks like I have watched some documentaries recently but why do I feel like a documentary about this whole thing can be created in future.
But also, More and more I am feeling like AI is an absolute black box, nobody knows how to do things but we are all kind of doing experiments with it and seeing what sticks (like how we now have definitive proof that 1 human 1 agent > many agents no human in the loop)
And this is when we are 1 month in 2026, who knows what other experiments and proofs happen this year to find more about this black box, and about its usefulness or not.
Simon, it would be interesting if you could read the thread of predictions of 2026 thread in hn each month or quaterly to see how many people were wrong or right about AI as we figure out more things perhaps.
After three days, I have it working with around 20K LOC, whereas ~14K is the browser engine itself + X11, then 6K is just Windows+macOS support.
Source code + CI built binaries are available here if you wanna try it out: https://github.com/embedding-shapes/one-agent-one-browser
it's amazing how far we've come in 20 years. i was a (very minor) contributor to khtml/konqueror (before apple got involved w/ webkit) in the early 2000s, and back then it was such a labor intensive process to even create a halfway working engine. like, months of work just to get basic rendering somewhat correct on a very small portion of the web (which was obv much smaller)
in addition to agentic coding, i think for this specific task having css-spec/html-spec/web-platform-tests as machine readable test suites helps a LOT. the agent can actually validate against real specs.
back in the day, despite having gecko as an open source reference, in practice the "standards" were whatever IE was doing. so you'd spend weeks implementing something only to discover every site was coded for IE's quirks lmao. for all of their other faults, google/apple and other contributors helped bring in discipline to that.
You know, I placed the specs in the repository with that goal (even sneaked in a repo that needs compiling before being usable), but as far as I can see, the agent never actually peeked into that directory nor read anything from them in the end.
It'll be easier to see once I made all the agent sessions public, and I might be wrong (I didn't observe the agent at all times), but seems the agent never used though.
very excited to see the agentic sessions when you release them.. that kind of transparency is super valuable for the community. i can see "build a browser from scratch" becoming a popular challenge as people explore the limits of agentic coding and try to figure out best practices for workflows/prompting. like the new "build a ray tracer" or say nanogtp but for agents.
That'll be dope. The tokens used (input,output,total) are actually saved within codex's jsonl files.
I would happily use local models if I could get them to perform, but they’re super slow if I bump their context window high, and I haven’t seen good orchestrators that keep context limited enough.
I've been very skeptical of the real usefulness of code assistants, much in part from my own experience. They work great for brand new code bases, but struggle with maintenance. Seeing your final result, I'm eager to see the process, specially the iteration.
I searched for "security" and "vuln" in both the article and this discussion thread, and found no matches.
I guess the code being in Rust helps, but to what exent can one just rely on guarantees provided by the language?
(I know practically nothing about Rust.)
I don't think Rust helps much except preventing some very basic issues, for example, I don't think it even checks that URLs aren't referencing local files on disk, who knows how the path handling works, might be able to put absolute paths on remote pages and load local content? Unsure, but wouldn't surprise me.
Might be a bit safer due to no JS engine, so even if someone did what I outlined before, they couldn't really exfiltrate anything, there is no POST/PUT requests or forms or anything :)
I'm sure if someone did a proper audit they'd find double-digit high severity issues, at least.
> I get to evaluate on stuff like links being consistently blue and underlined
Yeah, this browser doesn't have a "default stylesheet" like a regular browser. Probably should have added that, but was mostly just curious about rendering the websites from the web, rather than using what browsers think the web should look like.
> It may be that some of the rendering is not supported on windows- the back button certainly isn't.
Hmm, on Windows 11 the back button should definitively work, tried that just last night. Are you perhaps on Windows 10? I have not tried that myself, should work but might be why.
Yep, I ran it on an old windows 10 VM I had puttering about.
I think it must have a default link styling somewhere, as some links are the classic blue that as far as I know I intentionally styled to be black- but this could be css spaghetti in tufte.css finally coming to haunt me.
Well, that's how this browser came to be, because I felt something similar to with how Cursor presented their results :) So I guess we're in the same club, somehow.
And yeah, lots of websites render poorly, for obvious reasons, if it's better or worse than Cursor's I guess will be up to the public, I'm sure if I actually treated it as a professional project I could probably get it to work quite nicely rather than the abomination it currently is.
But we're very far from a browser here, so that's not that impressive. Writing a basic renderer is really not that hard, and matches the effort and low LoC from that experiment. This is similar to countless graphical toolkits that have been written since the 70s.
I know Servo has a "no AI contribution" policy, but I still would be more impressed by a Servo fork that gets missing APIs implemented by an AI, with WPT tests passing etc. It's a lot less marketable I guess. Go add something like WebTransport for instance, it's a recent API so the spec should be properly written and there's a good test suite.
The fact that it compiles is better the the cursor dude. "It Compiles" is a very low bar to working software.
Unfortunately, this context is kind of implicit, I don't actually mention it in the blog post, which I probably should have done, that's my fault.
That's why taking a step back and seeing what's actually hard in the process and bad with the output, felt like it made more sense to chase after, rather than anything else.
FWIW I ran your binary and was pleasantly surprised, but my low expectations probably helped ;)
The next challenge I think would be to prove that no reference implementation code leaked into the produced code. And finally, this being the work product of an AI process you can't claim copyright, but someone else could claim infringement so beware of that little loophole.
I think the focus with LLM-assisted coding for me has been just that, assisted coding, not trying to replace whole people. It's still me and my ideas driving (and my "Good Taste", explained here: https://emsh.cat/good-taste/), the LLM do all the things I find more boring.
> prove that no reference implementation code leaked into the produced code
Hmm, yeah, I'm not 100% sure how to approach this, open to ideas. Basic comparing text feels like it'd be too dumb, using an LLM for it might work, letting it reference other codebase perhaps. Honestly, don't know how I'd do that.
> And finally, this being the work product of an AI process you can't claim copyright, but someone else could claim infringement so beware of that little loophole.
Good point to be aware of, and I guess I by instinct didn't actually add any license to this project. I thought of adding MIT as I usually do, but I didn't actually make any of this so ended up not assigning any license. Worst case scenario, I guess most jurisdictions would deem either no copyright or that I (implicitly) hold copyright. Guess we'll take that if we get there :)
The license implicitly defaults to "I own all the rights", so no one is able to override that implicit license by copying the code and slapping their own license on top, I'm not sure if this is what you were thinking about when you said "claims copyright once can add whatever"?
Then on a different note, I'm not licensing/selling/providing any terms, so it's short of impossible for someone to credibly claim I warranted anything, there are no terms in the first place, except any implicit ones.
Maybe in the US works differently, and because Microsoft is in the US, that can somehow matter for me. But I'm not too worried about it :)
Thanks for the consideration and care though, that's always appreciated! :)
I know it's a little apples-and-oranges (you and the agent wouldn't produce the exact same thing), but I'm not asking because I'm interested in the man-hour savings. Rather, I want to get a perspective on what kind of expertise went into the guidance (without having to read all the guidance and be familiar with browser implementation myself). "How long this would have taken the author" seems like one possible proxy for "how much pre-existing experience went into this agent's guidance".
I don't think I'd be able to do this on my own. Not that I don't know Rust, but because I don't know X11 (nor macOS or Windows) well enough to even know where to begin.
I've been a Linux user for almost two decades, so I know my way around my system, but never developed X11 applications or anything, I'm mostly a web developer who jumped around various roles through the years. Spent a lot of time caring deeply about testing, infrastructure, architecture/design and communication between humans, might have given me a slight edge in programming together with agents.
The prompts themselves were basically "I'd like this website to render correct: https://medium.com, here's how it looks for me in Firefox with JavaScript turned off: [Image], figure out what features are missing, add them one-by-one, add regression texts and follow REQUIREMENTS.md and AGENTS.md closely" and various iterations/variations of that, so I didn't expressively ask it to implement specific CSS/HTML features, as far as I can remember. Maybe the first 2-3 prompts I did, I'll upload all the session files in a viewable way so everyone can see for themselves what exactly went on :)
If you paste https://github.com/embedding-shapes/one-agent-one-browser into the "GitHub Repository" tab it estimates 4.58 person-years and $618,599 by year-2000 standards, or 5.61 years and $1,381,079 according to my very non-trustworthy 2025 estimate upgrade.
Note that I started the project in Nov 2023 and can only work on it maybe 1-2 hours a day because it's just a side project.
So I think your tool either estimates based on very bad programmers, or it's just wrong. Or maybe 10x programmers are real and I am him
To me this is a case where knowing that you don't have data is better than having data and pretending it means anything
I wonder if you've looked into what it would take to implement accessibility while maintaining your no-Rust-dependencies rule. On Windows and macOS, it's straightforward enough to implement UI Automation and the Cocoa NSAccessibility protocols respectively. On Unix/X11, as I see it, your options are:
1. Implement AT-SPI with a new from-scratch D-Bus implementation.
2. Implement AT-SPI with one of the D-Bus C libraries (GLib, libdbus, or sdbus).
3. Use GTK, or maybe Qt.
But I think this is one of those experiments that I need to put a halt to sooner rather than later, because the scope can always grow, my mind really likes those sorts of projects, and I don't have the time for that right now :)
You're not the only one to say this, maybe there is a value in a minimal HTML+CSS browser that still works with the modern (non-JS using) web, although I'm not sure how much.
Another idea I had, was to pile another experiment on top of this one, more about "N humans + N agents = one browser", in a collaborative fashion, lets see if that ends up happening :)
I'll keep them in mind for the future, who knows, maybe some interesting iteration could be done on what's been made so far.