Posted by embedding-shape 9 hours ago
Most of the time when I develop professionally, I restart the session after each successful change, for this project, I initially tried to let one session go as long as possible, but eventually I reverted back to my old behavior of restarting from 0 after successful changes.
For knowing what file it should read/write, it uses `ls`, `tree` and `ag ` most commonly, there is no out-of-band indexing or anything, just a unix shell controlled by a LLM via tool calls.
> since getting the agent to self-select the right scope is usually the main bottleneck
I haven't found this to ever be the bottleneck, what agent and model are you using?
That's why taking a step back and seeing what's actually hard in the process and bad with the output, felt like it made more sense to chase after, rather than anything else.
FWIW I ran your binary and was pleasantly surprised, but my low expectations probably helped ;)
The next challenge I think would be to prove that no reference implementation code leaked into the produced code. And finally, this being the work product of an AI process you can't claim copyright, but someone else could claim infringement so beware of that little loophole.
I think the focus with LLM-assisted coding for me has been just that, assisted coding, not trying to replace whole people. It's still me and my ideas driving (and my "Good Taste", explained here: https://emsh.cat/good-taste/), the LLM do all the things I find more boring.
> prove that no reference implementation code leaked into the produced code
Hmm, yeah, I'm not 100% sure how to approach this, open to ideas. Basic comparing text feels like it'd be too dumb, using an LLM for it might work, letting it reference other codebase perhaps. Honestly, don't know how I'd do that.
> And finally, this being the work product of an AI process you can't claim copyright, but someone else could claim infringement so beware of that little loophole.
Good point to be aware of, and I guess I by instinct didn't actually add any license to this project. I thought of adding MIT as I usually do, but I didn't actually make any of this so ended up not assigning any license. Worst case scenario, I guess most jurisdictions would deem either no copyright or that I (implicitly) hold copyright. Guess we'll take that if we get there :)
That is Ladybird Browser if that was not already obvious.
https://github.com/LadybirdBrowser/ladybird/blob/master/CONT...
It's great to see him make this. I didn't know that he had a blog but looks good to me. Bookmarked now.
I feel like although Cursor burned 5 million$, we saw that and now Embedding shapes takeaway
If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.
Effectively to me this feels like answering the query which was being what if we have thousands of AI agents who can build a complex project autonomously with no Human. That idea seems dead now. Humans being in the loop will have a much higher productivity and end result.
I feel like the lure behind the Cursor project was to find if its able to replace humans completely in a extremely large project and the answer's right now no (and I have a feeling [bias?] that the answer's gonna stay that way)
Emsh I have a question tho, can you tell me about your background if possible? Have you been involved in browser development or any related endeavours or was this a first new one for you? From what I can feel/have talked with you, I do feel like the answer's yes that you have worked in browser space but I am still curious to know the answer.
A question which is coming to my mind is how much would be the difference between 1 expert human 1 agent and 1 (non expert) say Junior dev human 1 agent and 1 completely non expert say a normal person/less techie person 1 agent go?
What are your guys prediction on it?
How would the economics of becoming an "expert" or becoming a jack of all trades (junior dev) in a field fare with this new technology/toy that we got.
how much productivity gains could be from 1 non expert -> junior dev and the same question for junior -> senior dev in this particular context
[0] Cursor Is Lying To Developers… : https://www.youtube.com/watch?v=U7s_CaI93Mo
(If it was that's bad news for them as a company that sells tools to human developers!)
It was about scaling coding agents up to much larger projects by coordinating and running them in parallel. They chose a web browser for that not because they wanted to build a web browser, but because it seemed like the ideal example of a well specified but enormous (million line+) project which multiple parallel agents could take on where a single agent wouldn't be able to make progress.
embedding-shape's project here disproves that last bit - that you need parallel agents to build a competent web renderer - by achieving a more impressive result with just one Codex agent in a few days.
I think how I saw things was that somehow Cursor was/is still targetted very heavily on vibe coding in a similar fashion of bolt.dev or lovable and I even saw some vibe coders youtube try to see the difference and honestly at the end Cursor had a preferable pricing than the other two and that's how I felt Cursor was.
Of course Cursor's for the more techie person as well but I feel as if they would shift more and more towards Claude Code or similar which are subsidized by the provider (Anthropic) itself, something not possible for Cursor to do unless burning big B's which it already has done.
So Cursor's growth was definitely towards the more vibe coders side.
Now coming to my main point which is that I had the feeling that what cursor was trying to achieve wasn't trying to replace humans entirely but replace humans from the loop Aka Vibe coding. Instead of having engineers, if suppose the Cursor experiment was sucessful, the idea (which people felt when it was first released instantly) was that the engineering itself would've been dead & instead the jobs would've turned into management from a bird's eye view (not managing agent's individually or being aware of what they did or being in any capacity within the loop)
I feel like this might've been the reason they were willing to burn 5 million$ for.
If you could've been able to convince engineers considering browsers are taken as the holy grail of hardness that they are better off being managers, then a vibe coding product like Cursor would be really lucrative.
Atleast that's my understanding, I can be wrong I usually am and I don't have anything against Cursor. (I actually used to use Cursor earlier)
But the embedding shapes project shows that engineering is very much still alive and beneficial net. He produced a better result with very minimal costs than 5 million$ inference costs project.
> embedding-shape's project here disproves that last bit - that you need parallel agents to build a competent web renderer - by achieving a more impressive result with just one Codex agent in a few days.
Simon, I think that browsers got the idea of this autonomous agents partially because of your really famous post about how independent tests can lead to easier ports via agents. Browsers have a lot of independent tests.
So Simon, perhaps I may have over-generalized but do you know of any ideas where the idea of parallel agents is actually good now that browsers are off, personally after this project, I can't really think of any. When the Cursor thing first launched or when I first heard of it recently, I thought that browsers did make sense for some reason but now that its out of the window, I am not sure if there are any other projects where massively parallel agents might be even net positive over 1 human + 1 agent as Emsh.
The reason I got excited about the Cursor FastRender example was that it seemed like the first genuine example of thousands of agents achieving something that couldn't be achieved in another way... and then embedding-shapes went and undermined it with 20,000 lines of single-agent Rust!
I kind of left the agents to do what they wanted just asking for a port.
Your website does look rotated and the image is the only thing visible in my golang port.
Let me open source it & I will probably try to hammer it some more after I wake up to see how good Kimi is in real world tasks.
https://github.com/SerJaimeLannister/golang-browser
I must admit that its not working right now and I am even unable to replicate your website that was able to first display even though really glitchy and image zoomed to now only a white although oops looks like I forgot the i in your name and wrote willson instead of willison as I wasn't wearing specs. Sorry about that
Now Let me see yeah now its displaying something which is extremely glitchy
https://github.com/SerJaimeLannister/golang-browser/blob/mai...
I have a file to show how glitchy it is. I mean If anything I just want someone to tinker around with if a golang project can reasonably be made out of this rust project.
Simon, I see that you were also interested in go vibe coding haha, this project has independent tests too! Perhaps you can try this out as well and see how it goes! It would be interesting to see stuff then!
Alright time for me to sleep now, good night!
https://bsky.app/profile/emsh.cat/post/3mdgobfq4as2p
But basically I got curious and you can see from my other comments on you how much I love golang so decided to port the project from rust to golang and emsh predicts that the project's codebase can even shrink to 10k!
(although one point tho is that I don't have CC, I am trying it out on the recently released Kimi k2.5 model and their code but I decided to use that to see the real world use case of an open source model as well!)
Edit: I had written this comment just 2 minutes before you wrote but then I decided to write the golang project
I mean, I think I ate through all of my 200 queries in kimi code & it now does display me a (browser?) and I had the shell script as something to test your website as the test but it only opens up blank
I am gonna go sleep so that the 5 hour limits can get recharged again and I will continue this project.
I think it will be really interesting to see this project in golang, there must be good reason for emsh to say the project can be ~10k in golang.
Oh no, don't read too much into my wild guesses! Very hunch-based, and I'm only human after all.
The things that make me think this is still a huge project include:
1. JavaScript and the DOM. There's a LOT there, especially making sure that when the DOM is updated the page layout reflows promptly and correctly.
2. Security. Browsers are an incredibly high-risk environment, especially once you start implementing JavaScript. There are a ton of complex specs involved here too, like CORS and CSP and iframe sandbox and so on. I want these to be airtight and I want solid demonstrations of how airtight they are.
3. WebAssembly in its various flavors, including WebGPU and WebGL
4. It has to be able to render the real Web - starting with huge and complex existing applications like Google Maps and Google Docs and then working through that long tail of weird old buggy websites that the other browsers have all managed to render.
I expect that will keep people pretty busy for a while yet, no matter how many agents they throw at it.
And yes there's definitely still a lot to do. Security is def a big one.
Very exciting time to be alive.
> I lead AI & Engineering at Boon AI (Startup building AI for Construction).