Dynamic Workflows in Claude Code

Posted by mil22 3 hours ago

Dynamic Workflows in Claude Code(claude.com)

109 points | 92 comments

SkyPuncher 3 hours ago|

I don't really get this. At this point, my limiting factor is not how quickly Claude can self-trudge through code. It's whether Claude is going to do the task correctly or not.

I need more mechanisms for controlling long-running sessions and dynamically injecting my thoughts, correction, and nudges rather than faster ways to burn through my tokens without knowing if the results are going to be correct.

wrs 2 hours ago||

I think the theoretical answer here is this:

"Agents address the problem from independent angles, other agents try to refute what they found, and the run keeps iterating until the answers converge."

So you will be supplying the "ground truth" (test suite, detailed spec, whatever) and empower an agent to use it to guide the other agents. Currently a lot of people do this sequentially in the form of multiple code-review passes by fresh agent sessions looking at the work of previous sessions.

Adversarial models are a longstanding technique in ML so it makes sense they would try to go this way.

vadansky 2 hours ago|||

I don't know, maybe I'm doing it wrong but I feel LLMs add a slop debt, and each agent pass just exuberates it.

Like I had an LLM implement a spec and said it was done... Except it had a ton of `casts` everywhere. Okay, my bad, I should have been clear "NO CASTS", so I use the LLM to remove the casts, except it just kept making things more and more complicated and ugly.

It took me taking a break and having a shower thought to realize all the ugliness is because one type should have been broken up into 2, which would remove a ton of generics and code. But Claude never suggested that, it was always "we need at least one cast here, or we need 1000 LOC of generic factories". I tried multiple new sessions with various prompts too.

Maybe one day soon LLMs could pay off their own slop debt but at least right now I don't trust them to write code unseen.

Edit: Maybe the correct action should have been to delete everything and make it re-write everything from scratch with the clear "NO CASTS EVER" rule. But still the point is feels like having LLM clean up after an LLM doesn't work well enough to just have keep it in a loop and never look at what it does.

vinnymac 1 hour ago|||

The problem is that we have an ever growing and large number of constraints, and not following even a single one means the result is sloppy.

I don’t see them fixing this any time soon, and thus human in the loop is a requirement to use these tools effectively. That is unless you love your slot machine dopamine rush enough to ignore quality gates and respect for your peers time.

highwaylights 2 hours ago||||

This matches my experience.

I've had to put a fair chunk of effort in to skills that will run deterministic mechanisms to unslop a codebase (cyclomatic complexity grading has been really helpful here) as invariably some amount of guidance around principles will be missed over time. I've found it does help, though. Certainly I'm getting overall better results from Flash and Sonnet over multiple runs for fairly modest token increases. GPT 5.5 less so, but that's because it scores better in a first pass. I won't really know until I gauge it at the end of my sub month which has been more cost efficient for me all things considered.

tomjakubowski 51 minutes ago||||

I've been reading writing Rust for a long while now, since before 1.0. I'm capable of critically evaluating Rust code. I'm also a happy Claude Code user, mostly for lightweight uses like generating scaffolding, prototyping, and debugging.

The pure LLM, no human intervention vibe-coded PRs on Bun since the vibe-rewrite to Rust contain the worst coding horrors I've seen in 20 years of programming.

Setting aside the quality of the change itself (I would have done it differently, for sure: it is pretty straightforward to build a safe abstraction out of this type), the utterly pointless "source-text consistency test" added here is easily the worst example of "test repeats implementation" I have seen in my career:

https://github.com/oven-sh/bun/pull/30728/files#diff-863477b...

implexa_founder 20 minutes ago|||

[flagged]

tsunamifury 2 hours ago|||

Ground truth is not consensus, it has to be graded against what actually works for the original goal. Plenty of scenarios with AI and Humans can result in consensus around incorrectness.

adamtaylor_13 2 hours ago||

While pedantically correct, I think the comment above assumed that you've correctly specified the work. If you can't correctly specify your work, AI agents are just going to help you get a non-solution faster.

tsunamifury 2 hours ago||

Isn't coding the act of specificying the work to a processor? And AI agents are supposed to bridge the gap with intelligence from less specificed to more specified or possibly even more intelligent and alternate implementations?

adamtaylor_13 2 hours ago||

Yep. And yet, there's still some level of specification you have to do.

eggplantemoji69 1 hour ago|||

This is my experience. Quantity of output is not the issue right now. Quality is. But I’m not sure if this will ever be solved for, given LLMs are non-deterministic sophisticated autocomplete at their core.

Sure, ‘human in the loop’ and all that jazz, but I feel like my knowledge suffers even with this approach. I have to use llms w pinpoint focus to get decent results.

The original copilot completions behavior might be peak llm performance for coding, sans having an agent write boilerplate and such.

root-parent 2 hours ago|||

When this is all finished and done, these coding models will allow you to rewrite the linux kernel in rust, recode Kubernetes in assembly, and create your own web framework in 10 min.

But each prompt will cost your company, 10 to 15 million dollars. An extra 20 million if you ask them to review the code and improve the comments.

Jarred 2 hours ago|||

Dynamic workflows, in my experience, make Claude more effective at complex long-running tasks. They help precisely with getting Claude to do the task correctly.

It feels more like a bespoke build system for the specific task/project than prompting a freeform chat.

aloknnikhil 2 hours ago|||

As long as agents are fuzzy (which they will continue to be with the Transformers architecture), the need to validate will continue to exist. I cannot imagine merging code without at least 1 human review.

MeetingsBrowser 2 hours ago||

I've used agents quite a bit and I agree.

The current baseline workflow is something like agent output -> human review -> agent refinement -> human review -> agent refinement -> ...

But agents are capable of making meaningful improvements to their own output. I'm hoping dynamic workflows move towards something like:

agent output -> agent review -> agent refinement -> (cycle to fixed point) -> final human review

chinathrow 2 hours ago|||

I think for now it's better to convert tokens into code/library code and then work with that for deterministic results rather than relying on Claude being correct or not.

jascha_eng 2 hours ago|||

yes I agree with this, more granular going back, letting me interrupt where it went off the rails, or even editing file reads myself etc would be lovely. Ingesting parts of other conversations would also be cool!

dude250711 2 hours ago|||

I have heard of "token-maxxing" but I have not heard of "correctness-maxxing" or "quality-maxxing".

mirashii 2 hours ago||

Not with those exact terms, but it is certainly being discussed. Wes McKinney said in a recent talk that with current coding agents there’s no longer an excuse for shipping suboptimal code that takes on tech debt. Writing tests has never been cheaper, writing custom fuzzers, linters, and other harnesses that serve as guardrails has never been cheaper. His take is that “we didn’t have enough engineering time to do it right” is no longer an excuse, and the only excuses left are that you don’t know any better or you have bad taste.

encoderer 1 hour ago||

The answer for me has been actually more tokens, and create even more layers of automated verification

mil22 3 hours ago||

Interesting to note, not sure if this was known publicly before today's blog post:

Rewriting Bun with dynamic workflows

An example of what dynamic workflows can unlock at scale is the recent rewrite of Bun. Jarred Sumner used dynamic workflows to port Bun from Zig to Rust with 99.8% of the existing test suite passing, roughly 750,000 lines of Rust, and eleven days from first commit to merge. One workflow mapped the right Rust lifetime for every struct field in the Zig codebase. The next wrote every .rs file as a behavior-identical port of its .zig counterpart, hundreds of agents working in parallel with two reviewers on each file. A fix loop then drove the build and test suite until both ran clean. After the port landed, an overnight workflow addressed unnecessary data copies and opened a PR for each for final review. While not yet in production, all of this was handled by dynamic workflows. Jarred will be writing about this more in the future.

SkyPuncher 3 hours ago|

I'm extremely skeptical that dynamic workflows had anything to do with this. I've been able to refactor one of the most complicated parts of our code base with similar results.

Mechanical refactors are relatively straight forward for agents.

jeswin 2 hours ago||

> I've been able to refactor one of the most complicated parts of our code base with similar results. Mechanical refactors are relatively straight forward for agents.

A rewrite of bun in Rust is unlikely to be a trivial mechanical refactor. And if you are not sharing what the complicated parts were, or how big it is, how do we assess that the task was similar?

Unless you are intimately familiar with the bun codebase and you've already made that assessment.

trjordan 3 hours ago||

It feels like we're far past the point of where having AI do more faster is helpful.

It's telling that they used "rewrite Bun in Rust" as the proof point here. It's cool! But the vast majority of software engineering doesn't start with tens of thousands of tests, where making them pass is the whole job.

In my experience, AI still drifts from what I meant it to do on anything bigger than building a widget. My time is spent suspiciously reviewing output for changes the agent snuck in, or invariants it broke. I talked with a friend recently where the agent broke the test harness badly enough that none of the tests mattered for 3 weeks. They did pass, though, so CI never complained.

There's something at the intersection of context engineering, managing that sloppy pile of markdown plans, and good old fashioning system understanding that's the real bottleneck.

bcherny 3 hours ago||

A few of us from the Claude Code team will be hanging around if anyone has questions! Very excited for this launch -- dynamic workflows have been a game changer for engineering here at Anthropic. Can't wait to hear what you think.

_boffin_ 6 minutes ago||

This isn't related to Dynamic Workflows, but more on the telemetry / observability side of things.

Why'd you guys not want to allow the traceparent in hooks, but allowed the session.id? Any plans on changing that?

bryan0 2 hours ago|||

Thanks to you and the anthropic team for developing such exciting tools! The blog post seems to position workflows for “breadth”: generating fixes / refactors against large code bases. What about for “depth”: developing specific new features and functionality end-to-end? I’ve struggled to make this work reliably using the current experimental agent teams. Does this replace or augment that functionality?

bcherny 2 hours ago||

Yes, it also helps! That's a place where raw model capability is the most helpful, but we do find that some dynamic workflow configurations can be helpful too.

bryan0 2 hours ago||

Cool! If you can point to any examples of those types of workflow configurations I’d be super interested. For example, to have a team of agents review a PR and iterate on it until all requirements are met including UX, security and product functionality goals. If they could “converge” to a solution like workflows seems to be designed for that would be amazing.

hbarka 2 hours ago|||

Hi Boris. Love the velocity of features. Are you planning on adding a secrets manager? Enterprise workflows almost always require an encrypted parameter or calling a secret.

tomjakubowski 39 minutes ago||

Why should secrets be built in? What's the issue with tool use and something like 1password's or Vault's CLI?

rsstack 3 hours ago|||

Will you document how to (AI-)author and share reusable workflows between team members, to ensure some consistency of quality?

Maybe blasphemy, but will workflows be able to use non-Anthropic LLMs (e.g., delegating some steps to local models, but design and review by Claude)?

bcherny 2 hours ago||

Yes, more docs + technical details coming soon.

vblanco 2 hours ago|||

I have my own version and the workflow keyword conflicts with it rather heavily. Will there be a way to disable that prompt section/keyword?

bcherny 1 hour ago||

Yep! Set disableWorkflows:true in your settings.json

vblanco 1 hour ago||

thank you

franze 2 hours ago|||

just wanted to say thank you, just did a 2 days "ai computer use" workshop - think a virtual desktop on hetzner with claude code in yolo mode, a github account, vercel and logged in into a google account and claude had all the credentials and then let a mix of marketing / product manager / sales / customer support let loose. 2k token budget ... and just let them see do magic again and again.

thx for all that amazing tec and save ai

thallavajhula 3 hours ago|||

Hi Boris! Thanks for Claude Code.

Is there an example of how y'all use Dynamic Workflows internally that you could share with the rest of us here so that we can mimic something similar?

bcherny 2 hours ago||

Hey, yep. A few things I personally used dynamic workflows for over the last few weeks:

1. Autonomously landed 20+ optimizations to reduce Claude Code's token usage by ~15%

2. Ported tree-sitter, color-diff, yoga-layout, and a number of other WASM and Rust native modules to TypeScript, improving CPU and memory use by 2-10x in the process

3. Made our CI faster, and repeatedly found and fixed flaky tests (with /loop)

4. Migrated from regex-based bash static analysis to tree-sitter, reducing false positive permission prompts by 45%

5. Reduced Claude Agent SDK startup time by 61%, by repeatedly profiling and optimizing the startup path, putting up a number of PRs in the process

6. Shipped 69 code simplification PRs, deleting >10k lines of code

sangeeth96 2 hours ago|||

> Ported tree-sitter, color-diff, yoga-layout, and a number of other WASM and Rust native modules to TypeScript, improving CPU and memory use by 2-10x in the process

Curious to learn more on this (unless there’s a write-up in the works). I’m naive on this matter but:

1. is this because it’s higher cost when passing objects back and forth across the JS/native boundary? 2. Does this have anything more specific to do with use of Bun? 3. is the stance for claude code then to keep all the deps in raw TypeScript? 4. How do you folks keep these ported deps up-to-date?

guybedo 42 minutes ago||||

this feels more like a PR statement than a description of how you used the tool though

theLiminator 1 hour ago||||

Is there not a reason to instead port claude code to rust? Do you have internal benchmarks that show that claude code is better at typescript than rust?

mkw5053 2 hours ago||||

Very cool. What % of the CC team's engineering would you say goes into QoL (as opposed to new feature development)? Obviously some live in a grey area, while others are more clear like making CI faster.

rahkiin 2 hours ago||||

You _reduced_ its _efficiency_? Why do you make CC more inefficient?

isoprophlex 2 hours ago|||

Maxxing everything is all the rage. Gotta cpumaxx or bossman isnt getting his money's worth

bcherny 2 hours ago|||

Typo! Edited

JimJohn4292 2 hours ago|||

Boris, what are your thoughts on WASM as a technology and it's practical implications for AI in the future?

stvpwrs 2 hours ago|||

Will workflows be reusable? I have a big use case of sharable and repeatable workflows for projects. Especially if this comes to Cowork.

bcherny 2 hours ago||

Yes!

andrewmutz 2 hours ago||

Any idea how soon dynamic workflows might be available in Cowork?

m0meni 2 hours ago|||

What language are the workflows in? Curious what you settled on. And are they running in the cloud or locally?

bcherny 2 hours ago||

JavaScript, running locally or in the cloud.

wilg 2 hours ago|||

I tried creating a workflow in Claude 1.9255.2 (1dc8f7) 2026-05-27T01:57:20.000Z

and got

API Error: 400 messages.3.content.11: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

Tried again in

Claude 1.9659.1 (193bcb) 2026-05-28T16:22:15.000Z also but may need a new chat

bcherny 2 hours ago||

Looking

wilg 2 hours ago||

Still seeing it in new threads with Claude 1.9659.1 (193bcb) 2026-05-28T16:22:15.000Z

wilg 2 hours ago|||

Can you please fix the issue where like 99.99999% of the time Claude tries to launch a subagent on its own accord it gets "Prompt is too long" and tries several more times, then gives up and does it without the subagent. Big waste of time and tokens and not getting almost any subagent advantages. Not kidding that this happens about 100 times a day.

k2xl 3 hours ago|||

How do you guys plan feature support between the CLI and Claude Desktop?

bcherny 2 hours ago||

We generally build features into the Claude Agent SDK, which is shared by CLI, Desktop, VSCode, and cloud.

unshavedyak 1 hour ago||

VSCode has an official client? Given IDE usage is being restricted from Claude Code via the CC SDK tokens going to the Claude API rather than your CC Subscription, i'm unclear which IDEs can actually use claude code now.

Eg is Zed capable of using a Claude Code Subscription?

tsunamifury 3 hours ago|||

This is really dissapointing release for such a promising technique. Long walks with fanned vectors can actually be token optimizing vs token burning when combined with self grading each agent along the walk and compared to manual long coding walks to solve first pass problems. But instead this frames it (assumptively) as a tokenmaxxing strategy. There are also many other strartegies that can prove effeciency and wider solution consideration with consensus, but none of this is explained why its an improvement or better than other technqiues.

Its like you guys aren't even aware of the primary problem you are all facing: your token burns aren't paying off anyore against standard coding -- and looking net negative. I have to ask, are you this unaware of your core problem set here?

There are no any examples, proofs, or scenarios that show why there is improvement either in complexity or reliability of the solution or effeciency to the path of the solution. I'm baffled.

Depurator 1 hour ago|||

[dead]

vld_chk 3 hours ago||

Quite a thing to use Bun rewrite to Rust as example of dynamic workflows, while now it is considered as anti pattern which leads team to stop supporting the tool due to inability to properly understand and navigate 1m vibe coded Rust lines

ncphillips 1 hour ago||

I just hit my Claude Max limit for the first time _ever_ thanks to workflows lol

Like 90 agents ran to do a code review of a fairly small package I have.

They're really looking for us to increase token usage aren't they?

tomjakubowski 37 minutes ago|

This is a fundamental incentive issue with any company that does all of training models, building harnesses for them, and offering them as a service.

Deukhoofd 3 hours ago||

I'm going to be honest, this very much reads like an exciting new way to burn up as many tokens as possible. Large amounts of parallel agents that all have all their work double-checked by multiple other agents, and that keeps running for a longer period of time?

I feel like there are more efficient ways to tackle the issues given.

ithkuil 19 minutes ago|

Possibly. But otoh one cannot complain that agents don't produce high quality code while at the same time not allowing them to thoroughly go through all the steps required to produce high quality code

tra3 3 hours ago||

I say this as someone who's found LLMs incredibly beneficial.

Is this a way to increase token burn?

I thought we covered this with Claude's C compiler. What changed?

mattas 3 hours ago|

My initial reaction was that this is tokenmaxxing disguised as a product.

aabdi 2 hours ago||

wrote something similar for my own use/work stuff; seems everything is converging towards similar ideas.

IMO, this style of workflow/agentics is how all SWE'll look like long term. Automate everything into a big pipe-y thing. How it's gonna be modelled is up in the air though. lots of different approaches:

mine: https://github.com/portpowered/you-agent-factory

https://github.com/ComposioHQ/agent-orchestrator

https://github.com/gastownhall/gastown

https://github.com/openai/symphony

afro88 2 hours ago|

I tried this out yesterday - lucky enough to have access through EAP at work. The workflows that are generated are quite good - smart parallelisation and phasing. End results for larger chunks of work are also much better, which I attribute to more of the work having clean context windows (Opus 4.7 is unusable past 200k conversation length, and each subagent ends up using less than that IME). They also seem to have a validation phase hint in the workflow generator which also helps a lot. Speed is a bonus.

You can achieve a similar result manually prompting to use subagents, yes. But the TUI for in flight dynamic workflows is really nice - great visibility into exactly what's happening.

Honesty, for anything larger than a 1 shot PR, it's worth firing off a workflow for better automatic context management alone (more work done in the first 20% sweet spot)

More comments...