Posted by pramodbiligiri 3 days ago
"Sharing knowledge" is one of the first phrases in the article, and highlighted as a key benefit of code review. But the loss to human-capital from this process is never examined in the post.
> Trivial reviews (typo fixes, small doc changes) cost 20 cents on average
They did around 25,000 of these runs (about 20% of total). So CF spent $5k in the period making language models run through PRs which were <10 lines long. I get that CF engineers are paid well, but the labour cost of having an intern/entry level engineer spend ~30-60s looking through these is likely close to $0.20, and that engineer builds some human-capital while they're at it.
we are finding lots of value in self review. its the “imagine you are doing a synchronous paired review with someone - anything that is difficult to explain, has a code smell, doesnt fit the architecture of the system around you, write a comment.” then at the end, agents do a good job of looping over PR comments.
the second thing would be a guided, educational code review tool - there are a few attempts at this, but nothing that has a good enough interface to actually stick. organize hunks by semantic importance, spend some tokens exploring the surrounding systems, showing how new code, public apis and data model flow with the existing design, and allow a human to traverse larger PRs more quickly.
thank you to cloudflare for publishing this.
> agents do a good job of looping over PR comments
This is the easy part. Most harnesses enable some sort of integration now, so you can actually create a smooth local experience around this as well - better code before it ships to more costly review or bloats PR threads.
> guided, educational code review tool
This is a bit tougher, and I find the main harness chat tends to work best. I learn better when I'm more engaged and aware of what I'm asking. It's easy to stick a code tour type of thing on a screen. It's hard to really nail the right attention and learning mechanism around it.
I’d prefer to have that happen as some sort of pre commit hook, before a merge request is sent. The feedback loop might be a bit faster and the process might produce less noise this way.
It’s exclusively been the other way around where having a smaller number of larger squished commits (post merge) that’s made the project be more maintainable.
Pre-commit hooks should be fast, as it's something you'd do (normally) a few dozen times a year. I don't believe sending a review job to a remote agent is fast, nor will waiting for a review to finish a commit be good for anyone.
CI on the other hand can be slower and runs async, it's fire and forget so you can switch tasks.
If noise is an issue, one possible solution is to create a merge request, have the tools review it, make the fixes, rewrite history like you did it perfectly the first time ("fix" commits are noise), then create a new one for human review.
I think approaches like this don't need to run other than locally. Maybe integrated as pre-push hook. The system is nondeterministic, so it's at odds with the purpose of CI.
The ROI here is so high that I don't mind using the strongest model available for the actual code review. I don't trust Sonnet and such. Just let Opus or GPT 5.5 do the whole thing and pay a bit more for less complexity.
would love to look into it if any part of it is open source
I had the same problem in my recursive agent harness. It would always come back, but it could sometimes take up to 10 minutes depending. I fixed this by adding a required "purpose" argument to every tool and call/return event. As the recursive evaluation proceeds, every single thing that happens streams incremental purpose text to the user's browser (also using the magic of JSONL for this). The incremental progress events contain the purpose and a detail section (tool arg JSON) that the user can expand/collapse.