Show HN: Showboat and Rodney, so agents can demo what they've built

Posted by simonw 5 hours ago

Show HN: Showboat and Rodney, so agents can demo what they've built(simonwillison.net)

80 points | 47 comments

samuelson 1 hour ago|

I love your content, but I wish you'd make your blog theme responsive for wider screens/non-mobile. I prefer to read content like this on a large screen.

Showboat seems like it could actually be quite useful for humans too, just for making quick notes from a CLI without opening an editor. The "pop" command makes me wonder if there would be a benefit to also having an array-like in addition to the stack-like interface. It seems like it would be fairly trivial to generate an index of markdown blocks so that they could be edited individually.

I like the idea of Rodney, but I wonder if you might actually have better results by asking the agent to generate equivalent Selenium scripts instead. I'm specifically suggesting Selenium because it's been around so long so I assume there's a lot of Selenium in the LLMs training data, but there are other options that might work too.

simonw 1 hour ago|

First time someone's asked for the site to be wider! I have it setup so on a wide screen the text is still a readable width, do you think it needs to bump up that max width a bit more?

I've found the models are so good at Playwright that I don't consider Selenium any more. Rodney is my first experiment not using Playwright.

TheKnack 53 minutes ago||

I second the request to make the site responsive. When I load the page the CSS constrains the main content to 560px and the whole page is constrained to 940px. Here's how it displays on my system:

https://i.postimg.cc/zDMD9nYD/Simon.png

dwb 29 minutes ago|||

Your tastes are your own, and there is an argument for just filling the window, but you won’t find a typographic authority that advocates setting body text much wider than that (and I would agree with them).

simonw 23 minutes ago|||

Can you take a screenshot of some other site that is wider but has a comfortable reading width for you?

cadamsdotcom 1 hour ago||

Great to see you doing red/green TDD Simon!

Passing tests in your repo are great documentation of the tool at a microscopic level. And rerunning tests only burns tokens on failures (since passed tests just print a dot) so it’s token efficient too.

Some other neat tricks:

- For greater efficiency configure your test runner to print nothing (not even a dot/filename) for test successes. Agents don’t need progress dots, only the exit code & failure details

- Have your agent implement a 10ms timeout per test. pytest has hooks to do this. The agent will see tests time out and mock out all I/O and third party code - why test what one assumes third parties tested already! Your test suite is CPU-bound without a shared database, has no shared data and no tests that interfere with or depend on each other, so tests can run in parallel.

simonw 21 minutes ago|

That timeout trick is very neat!

I'm OK with longer running tests because I always have them run against a real database (often SQLite, sometimes PostgreSQL) and real files created in temporary directories but I can see how the time limit might be useful for tests that don't need those kind of components.

Hansenq 4 hours ago||

I was a bit confused as to how everything works until I read it in detail. Really cool tools, but I think one thing that would help in the introduction is: saying explicitly that the generated .md document is for you (the user) to read through, observe the output of the CLI call, and ensure that the output matches what you would expect.

It's basically an automated test, but at a higher abstraction level and with manual verification--using CLI tools rather than a test harness. Really great work!

giancarlostoro 4 hours ago||

I'll be sure to try these out. I've been building my own alternative to Beads with a concept called "gates" which do not let you close tasks as complete until a gate passes. Would love to throw these in as "gates" for my current workflow.

johnfn 3 hours ago||

Out of curiosity, what is the advantage of using Rodney when Playwright has the same set of features and AI understands how to write a Playwright script very well?

simonw 3 hours ago|

Maybe not a lot.

Showboat documents look neater if there are single one-line commands that do something useful. Dumping a full Playwright script into a cell is less readable.

Showboat also has a special feature where you can embed an image directly in the document by running:

  showboat image doc.md 'rodney screenshot'

The command you call should return a path to an image file as the last line of output. Rodney does exactly that.

It may well turn out that Rodney is unnecessary and people find better patterns using Showboat with existing tools like playwright-cli - in which case it won't matter because Showboat and Rodney aren't coupled to each other at all.

Showboat is definitely the more significant of the two projects.

Sharlin 2 hours ago||

I can't wait for tools that allow agents to hold stand-ups, retrospectives and sprint planning sessions, all facilitated by an agentic scrum master.

eclipxe 1 hour ago|

My clawdbot setup does just that. No joke.

eliben 4 hours ago||

Very interesting! I encountered the problems these tools are trying to tackle just recently while trying to guide an agent into creating an in-browser tool for me. Closing the loop on a web interface isn't as simple as CLI-only tools. I should give this a try.

It's also interesting that you've shifted to Go for your agent-coded CLI tools, Simon.

simonw 4 hours ago|

I'm dabbling with Go at the moment for small tools, mainly as an excuse to learn a new language but also because having a single standalone binary is convenient for shuttling these tiny little tools around.

... but then I'm mostly running them with "uvx name-of-tool" because it turns out Python's packaging infrastructure for binary tools is so good!

eliben 4 hours ago|||

Right, standalone binaries for CLI tools is great. And if one has Go installed, they can just `go run ...` any tool from its GitHub path, all installation/build/caching happens automagically (meaning the execution is immediate after the first run).

But I can definitely see how someone with `uv` muscle memory wants everything in the same command.

`uv` is the best thing that happened to the Python ecosystem since... I don't know... maybe Numpy.

markusw 4 hours ago|||

If you're coming from the Python world, definitely. I find `go install github.com/simonw/rodney@latest` equally easy. :D Although you need the Go tooling installed, of course. But so much agree, Go is great for CLIs!

sNyZZzzz 2 hours ago||

Using Markdown as both docs and executable output is cool, but I’m curious how it scales when agents hit more complex ui.

simlevesque 2 hours ago||

rodney seems to be pretty much the same as agent-browser: https://github.com/vercel-labs/agent-browser

simonw 2 hours ago|

Hah! I hadn't seen that one before. Yeah, the CLI design is very similar.

Main difference is Rodney can be installed as a single Go binary or via uv/pip, agent-browser is Rust and npm.

Looks like agent-browser was first released at the start of January, it's very new.

simlevesque 52 minutes ago||

Yeah it's pretty new indeed. It's very effective at doing pretty much any browser automation task and I have to say that using it with the included skill is pretty seamless.

mentalgear 3 hours ago|

A bit like jupyter notebooks, isn't it?

simonw 2 hours ago||

Yes, very much so. It's a much thinner, less feature-rich alternative.

It would be interesting to experiment with Jupyter notebooks as an alternative that could work in Claude Code for web.

I had a poke around just now and couldn't find an existing CLI tool that lets you build those up a section at a time in the same way as Showboat. I did find this Python library though:

    uv run --with nbformat python -c '
    import nbformat
    nb = nbformat.v4.new_notebook()
    nb.cells.append(nbformat.v4.new_markdown_cell("# NBTerm Exploration"))
    nb.cells.append(nbformat.v4.new_code_cell("import sys\nprint(f\"Python {sys.version}\")"))
    nb.cells.append(nbformat.v4.new_code_cell("x = [i**2 for i in range(10)]\nprint(x)"))
    nb.cells.append(nbformat.v4.new_code_cell("sum(x)"))
    with open("demo.ipynb", "w") as f:
        nbformat.write(nb, f)
    '

So you could tell the agent to run code like that and then inspect the `demo.ipynb` notebook later on. It doesn't show the result of evaluating the cells though, you need to run this afterwards to have that happen:

    uv run --with nbformat --with nbclient --with ipykernel python -c '
    import nbformat
    from nbclient import NotebookClient

    nb = nbformat.read("demo.ipynb", as_version=4)
    client = NotebookClient(nb, timeout=60)
    client.execute()
    nbformat.write(nb, "demo_executed.ipynb")
    '

mentalgear 2 hours ago||

Cool, I have to say I find the idea intriguing as a tracability tool in they that LLMs can show you step be step how a program is assembled / an output was generated.

samuelson 1 hour ago||

I think it's more about the interface than the output. The agent can add stuff to a markdown file with simple cli commands rather than a more complex editor or file interface.

More comments...