Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas

Posted by a24venka 6 hours ago

Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas(www.getspine.ai)

Hey HN! We're Ashwin and Akshay from Spine AI (https://www.getspine.ai). Spine Swarm is a multi-agent system that works on an infinite visual canvas to complete complex non-coding projects: competitive analysis, financial modeling, SEO audits, pitch decks, interactive prototypes, and more. Here's a video of it in action: https://www.youtube.com/watch?v=R_2-ggpZz0Q.

We've been friends for over 13 years. We took our first ML course together at NTU, in a part of campus called North Spine, which is where the name comes from. We went through YC in S23 and have spent about 3 years building Spine across many product iterations.

The core idea: chat is the wrong interface for complex AI work. It's a linear thread, and real projects aren't linear. Sure, you can ask a chatbot to reference the financial model from earlier in the thread, or run research and market sizing together, but you're trusting the model to juggle that context implicitly. There's no way to see how it's connecting the pieces, no way to correct one step without rerunning everything, and no way to branch off and explore two strategies side by side. ChatGPT was a demo that blew up, and chat stuck around as the default interface, not because it's the right abstraction. We thought humans and agents needed a real workspace where the structure of the work is explicit and user-controllable, not hidden inside a context window.

So we built an infinite visual canvas where you think in blocks instead of threads. Each block is our abstraction on top of AI models. There are dedicated block types for LLM calls, image generation, web browsing, apps, slides, spreadsheets, and more. Think of them as Lego bricks for AI workflows: each one does something specific, but they can be snapped together and composed in many different ways. You can connect any block to any other block, and that connection guarantees the passing of context regardless of block type. The whole system is model-agnostic, so in a single workflow you can go from an OpenAI LLM call, to an image generation mode like Nano Banana Pro, to Claude generating an interactive app, each block using whatever model fits best. Multiple blocks can fan out from the same input, analyzing it in different ways with different models, then feed their outputs into a downstream block that synthesizes the results.

The first version of the canvas was fully manual. Users entered prompts, chose models, ran blocks, and made connections themselves. It clicked with founders and product managers because they could branch in different directions from the same starting point: take a product idea and generate a prototype in one branch, a PRD in another, a competitive critique in a third, and a pitch deck in a fourth, all sharing the same upstream context. But new users didn't want to learn the interface. They kept asking us to build a chat layer that would generate and connect blocks on their behalf, to replicate the way we were using the tool. So we built that, and in doing so discovered something we didn't expect: the agents were capable of running autonomously for hours, producing complete deliverables. It turned out agents could run longer and keep their context windows clean by delegating work to blocks and storing intermediary context on the canvas, rather than holding everything in a single context window.

Here's how it works now. When you submit a task, a central orchestrator decomposes it into subtasks and delegates each to specialized persona agents. These agents operate on the canvas blocks and can override default settings, primarily the model and prompt, to fit each subtask. Agents pick the best model for each block and sometimes run the same block with multiple models to compare and synthesize outputs. Multiple agents work in parallel when their subtasks don't have dependencies, and downstream agents automatically receive context from upstream work. The user doesn't configure any of this. You can also dispatch multiple tasks at once and the system will queue dependent ones or start independent ones immediately.

Agents aren't fully autonomous by default. Any agent can pause execution and ask the user for clarification or feedback before continuing, which keeps the human in the loop where it matters. And once agents have produced output, you can select a subset of blocks on the canvas and iterate on them through the chat without rerunning the entire workflow.

The canvas gives agents something that filesystems and message-passing don't: a persistent, structured representation of the entire project that any agent can read and contribute to at any point. In typical multi-agent systems, context degrades as it passes between agents. The canvas addresses this because agents store intermediary results in blocks rather than trying to hold everything in memory, and they leave explicit structured handoffs designed to be consumed efficiently by the next agent in the chain. Every step is also fully auditable, so you can trace exactly how each agent arrived at its conclusions.

We ran benchmarks to validate what we were seeing. On Google DeepMind's DeepSearchQA, which is 900 questions spanning 17 fields, each structured as a causal chain where each step depends on completing the previous one, Spine Swarm scored 87.6% on the full dataset with zero human intervention. For the benchmark we used a subset of block types relevant to the questions (LLM calls, web browsing, table) and removed irrelevant ones like document, spreadsheet, and slide generation. We also disabled human clarification so agents ran fully independently. The agents were not just auditable but also state of the art. The auditability also exposed actual errors in an older benchmark (GAIA Level 3), cases where the expected answer was wrong or ambiguous, which you'd never catch with a black-box pipeline. We detail the methodology, architecture, and benchmark errors in the full writeup: https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-...

Benchmarks measure accuracy on closed-ended questions. Turns out the same architecture also leads to better open-ended outputs like decks, reports, and prototypes with minimal supervision. We've seen early users split into two camps: some watch the agents work and jump in to redirect mid-flow, others queue a task and come back to a finished deliverable. Both work because the canvas preserves the full chain of work, so you can audit or intervene whenever you want.

A good first task to try: give it your website URL and ask for a full SEO analysis, competitive landscape, and a prioritized growth roadmap with a slide deck. You'll see multiple agents spin up on the canvas simultaneously. People have also used it for fundraising pitch decks with financial models, prototyping features from screenshots and PRDs, competitive analysis reports and deep-dive learning plans that research a topic from multiple angles and produce structured material you can explore further.

Pricing is usage-based credits tied to block usage and the underlying models used. Agents tend to use more credits than manual workflows because they're tuned to get you the best possible outcome, which means they pick the best blocks and do more work. Details here: https://www.getspine.ai/pricing. There's a free tier, and one honest caveat: we sized it to let you try a real task, but tasks vary in complexity. If you run out before you've had a proper chance to explore, email us at founders@getspine.ai and we'll work with you.

We'd love your feedback on the experience: what worked, what didn't, and where it fell short. We're also curious how others here approach complex, multi-step AI work beyond coding. What tools are you using, and what breaks first? We'll be in the comments all day.

69 points | 59 comments

TheTaytay 4 hours ago|

I think this is really neat. You should probably take it as a compliment that the biggest criticisms so far are about the website landing page. ;)

I like canvases in general, and I especially like them for mentally organizing and referring to this sort of broad work. (Honestly, I think zoomable canvases would make a better window manager in general, but I digress)

One small piece of friction: My default mouse-based ways of dragging the canvas around (that work in most canvases like Figma) aren't working. I saw that you had a tutorial, and I have learned to hold space now, but I prefer the "hold middle mouse button to drag my canvas view around".

I've got a couple of research tasks running now, and my current open questions as a very new user are: 1) How easy will it be to store the outputs into a Github repository. 2) How easy will it be to refer back to this later? 3) Can I build upon it manually or automatically? 4) Can I (securely) share it with someone else for them to see and build upon it? 5) Can I do something "locally" with it? Not necessarily the model, but my preferred interface for LLMs at this point is Claude Code. Could I have a Claude Code instance running in one of these boxes somehow? 6) What if I want to do private stuff with it and don't like the traffic going through Spine's servers? Could I pay them for the interface, but bring my own keys? (Related: Can I self host somehow?) 7) When this is done, each artifact it found (screenshot, webpage, etc), is going to be helpful. The data-hoarder in me wants to make sure I can search these later. Heck, if I could do that, this would become my preferred "web browser". (But again, I digress.)

a24venka 4 hours ago|

Really appreciate the detailed feedback and questions! And yes, we'll take the website criticism as a compliment :)

Good callout on the canvas navigation, we'll look into middle mouse button support.

To answer your questions: 1) GitHub integration is on our roadmap. Right now you can export outputs manually but we want to make this seamless. 2) All your canvases are saved and you can search them by name in your dashboard. We're also working on a dedicated section for deliverables across canvases. 3) Yes to both! You can manually add or edit blocks, or kick off new agent runs that build on existing work. 4) You can currently only share public links of your canvas to others (but you can make it private at any point). We are testing out a teams feature which allows you to share canvases with members on your team securely. Beyond that, we are working on adding roles and email-based sharing controls which is in our roadmap. 5) Claude Code in a block is a really interesting idea. We don't support that today but we're thinking about computer use and coding workflows. 6)BYOK (bring your own keys) is something we've heard interest in and are considering. Self-hosting isn't available right now, though we do support private deployments for enterprise customers if that's ever relevant. 7) Love the 'preferred web browser' framing. Right now you can search canvases but searchable artifacts across canvases is definitely where we want to head.

Thanks for giving it a real spin, this kind of feedback is incredibly valuable.

swyx 2 hours ago||

> And yes, we'll take the website criticism as a compliment :)

ugh. guys. come on. stop celebrating at the 1 yard line. people are telling you they didnt even look at the product becacuse your landing page was so bad. you wasted your launch HN linking directly to it, ofc thats the first thing people are going to give feedback on. fix it right now you still have time.

levelsofself 41 minutes ago||

We run 13 AI agents in production on a $24/month VPS. Key things we learned: 1) File-based memory beats databases for LLM ops (portability, readability, no ORM). 2) Preflight checks on every edit prevent 90% of incidents. 3) Hash-chained audit logs are cheap insurance. 4) Ollama for chat responses, Claude for complex tasks - saves 95% on API costs. Open sourced the governance layer: https://github.com/levelsofself/mcp-nervous-system

jcims 25 minutes ago||

Got some great results for a rather broad domain in the first pass.

HN is going to tend towards negative/constructive feedback, for me the only issue is that the mouse interaction is a bit wonky. Took me a minute to realize that i could select different mouse modes. With that I'd say I'd echo TheTaytay's comment about mouse interaction and for me generating docx (which was the output of my agents, haven't even explored explicitly asking for something else) creates a bit of a barrier to use the content for me. Markdown or even HTML would be helpful.

But these are just minor nits, love the concept and great execution.

maliker 51 minutes ago||

It might just be me, but this interface is the first time I felt the desire to interact with long-running agents even though I use chat interfaces all day long. Maybe it was the demo video on the landing page which was compelling with its examples. Maybe it was the feeling that I could see what was going on because I would be on a canvas. Nicely done!

Off to keep iterating on the prototype app I started...

a24venka 37 minutes ago|

This is really great to hear, thank you! Have fun with the prototype, let us know how it goes.

johnyzee 4 hours ago||

Calling it a 'canvas' makes me think that this tool is about AI agents doing some kind of collaborative drawing. Looking at the vid though, it seems more like an environment for visually organizing and managing agentic work (which seems very cool, and quite a bit more than just a canvas).

a24venka 4 hours ago|

Agreed. The term has been overloaded lately. We also refer to it as a visual workspace which perhaps captures it a bit better.

BloondAndDoom 5 hours ago||

I didn’t read the post, I checked out the website just like 99% of the people will do.

Simple advice, if you are selling a product with a selling point of being visual, show it on your website. Not in a YouTube video but actual screenshots, short cut 10 sec video/gif

onion2k 4 hours ago||

It's a shame the team don't have access to a product that would automatically research and implement what's needed on an AI product website.

a24venka 5 hours ago||

Definite miss on our part, we're working on making the product experience more visible upfront on our landing page.

salomonk_mur 4 hours ago||

Friend, in the age of AI and even more so if you are selling an AI product, all you need is literally 2 screenshots and one prompt.

metalliqaz 4 hours ago||

There is an inverse relationship between how obviously useful a product is and how easy it is to produce screenshots.

ryhanshannon 2 hours ago||

I like the overall idea and presentation. In trying it out, I hit the token cap before my trial task was able to complete and show me the end result. I'm sure your free-tier token costs are non-trivial but it was definitely a bummer that I couldn't even see one initial run's output to decide if I wanted to pay.

I decided to gamble the one month fee to let it continue, but the payment defaulting to annual was jarring. I can see it lets you advertise a lower price but that only made me more tempted to leave altogether when I saw the price go up on the final screen.

airstrike 4 hours ago||

Congrats on the launch! Meta comment, but I just ain't reading all of the above. You need to be able to explain this in about 20% the number of words or you'll lose people, especially VC.

My advice is to start with "Spine Swarm solves _____" then how, then why you're different. 3 short paragraphs, preferably 1-2 sentences each.

a24venka 4 hours ago|

Agreed. We will make sure this comes through in our website.

jeingham 4 hours ago||

I have not seen my final report yet for my query around machinist.com. However I would like to say my initial impression is very positive. At least in terms of the digestion of my somewhat nebulous request. I like the way your app was able to burrow down to pain points I have experienced and am trying to work out in terms of product market fit for the domain. I look forward to exploring more and giving you more feedback when I see the final report. I will also add that I am looking forward to using your product to explore other opportunities that I'm sure are out there in this age of AI.

a24venka 3 hours ago|

This is great to hear, thank you! Would love to hear your thoughts once you see your final report and explore some of those other opportunities.

aleda145 4 hours ago|

Super cool!

I'm completely sold on the canvas layer. Embracing non linearity is such a boon when you're on the ideas stage. When you have verified it though, moving it to another medium (a document, presentation or just code) is often the best choice.

Do you see the canvases created with Spine as "one off" that you discard when you have got your deliverable, or as something living that you keep around?

I'm building a side project for running SQL on a canvas (kavla.dev), so I'm thinking about canvas workflows all the time!

a24venka 4 hours ago|

Thanks! Great question. We see canvases as living workspaces, you can revisit, iterate on, and build on them over time.

But the deliverables (docs, slides, code) are first-class outputs you can export and use independently. So it works both ways depending on the workflow.

Kavla looks cool, canvas-based SQL is a great use case for this kind of thinking!

aleda145 4 hours ago||

Nice! I'll make sure to try out Spine this weekend, if you want detailed feedback feel free to email me. You can find it in my profile.

More comments...