Posted by rs545837 6 hours ago
- do any concurrent branches touch this function?
- what new uses did this function accrete recently?
- did we create any actual merge conflicts?
Almost LSP-level querying, involving versions and branches.
Beagle is a revision control system like that [1]It is quite early stage, but the surprising finding is: instead of being a depository of source code blobs, an SCM can be the hub of all activities. Beagle's architecture is extremely open in the assumption that a lot of things can be built on top of it. Essentially, it is a key-value db, keys are URIs and values are BASON (binary mergeable JSON) [2] Can't be more open than that.
[1]: https://github.com/gritzko/librdx/tree/master/be
[2]: https://github.com/gritzko/librdx/blob/master/be/STORE.md
The pragmatic reason weave works at the git layer: adoption. Getting people to switch merge drivers is hard enough, getting them to switch VCS is nearly impossible. So weave parses the three file versions on the fly during merge, extracts entities, resolves per-entity, and writes back a normal file that git stores as a blob. You get entity-level merging without anyone changing their workflow.
But you're pointing at the ceiling of that approach. A VCS that stores ASTs natively could answer "did any concurrent branches touch this function?" as a query, not as a computation. That's a fundamentally different capability. Beagle looks interesting, will dig into the BASON format.
We built something adjacent with sem (https://github.com/ataraxy-labs/sem) which extracts the entity dependency graph from git history. It can answer "what new uses did this function accrete" and "what's the blast radius of this change" but it's still a layer on top of git, not native storage.
I built lix [0] which stores AST’s instead of blobs.
Direct AST writing works for apps that are “ast aware”. And I can confirm, it works great.
But, the all software just writes bytes atm.
The binary -> parse -> diff is too slow.
The parse and diff step need to get out of the hot path. That semi defeats the idea of a VCS that stores ASTs though.
There is room for improvement, but that is not a show-stopper so far. I plan round-tripping Linux kernel with full history, must show all the bottlenecks.
P.S. I checked lix. It uses a SQL database. That solves some things, but also creates an impedance mismatch. Must be x10 slow down at least. I use key-value and a custom binary format, so it works nice. Can go one level deeper still, use a custom storage engine, it will be even faster. Git is all custom.
The part that's been keeping me up at night: this becomes critical infrastructure for multi-agent coding. When multiple agents write code in parallel (Cursor, Claude Code, Codex all ship this now), they create worktrees for isolation. But when those branches merge back, git's line-level merge breaks on cases where two agents added different functions to the same file. weave resolves these cleanly because it knows they're separate entities. 31/31 vs git's 15/31 on our benchmark.
Weave also ships as an MCP server with 14 tools, so agents can claim entities before editing, check who's touching what, and detect conflicts before they happen.
I also think that this approach has a lot of potential. Keep up the good work sir.
https://x.com/agent_wrapper/status/2026937132649247118 https://x.com/omega_memory/status/2028844143867228241 https://x.com/vincentmvdm/status/2027027874134343717
GitHub’s ToS, because you suspect, so I can help you understand them.
> What violates it:
1. Automated Bulk issues/PRs, that we don't own
2. Fake Stars or Engagement Farming
3. Using Bot Accounts.
We own the repo, there's not even a single fake star, I don't even know how to create a bot account lol.> Scenario when we run out of free tokens.
Open AI and Anthropic have been sponsoring my company with credits, because I am trying to architect new software post agi world, so if I run out I will ask them for more tokens.
Dude did you just call me AI generated haha, i've been actively using weave for a gui I've been building for blazingly fast diffs
https://x.com/Palanikannan_M/status/2022190215021126004
So whenever I run into bugs I patched locally in my clone, I try to let the clanker raise a pr upstream, insane how easy things are now.
It's also based on treesitter, but probably otherwise a more baseline algorithm. I wonder if that "entity-awareness" actually then brings something to the table in addition to the AST.
edit: man, I tried searching this thread for mention of the tool for a few times, but apparently its name is not mergigraf
Cheers,
> git merges lines. mergiraf merges tree nodes. weave merges entities. [1]
I've been using mergiraf for ~6 months and tried to use it to resolve a conflict from multiple Claude instances editing a large bash script. Sadly neither support bash out of the box, which makes me suspect that classic merge is better in this/some cases...
Will try adding the bash grammar to mergiraf or weave next time
The key difference: mergiraf matches individual AST nodes (GumTree + PCS triples). Weave matches entities (functions, classes, methods) as whole units. Simpler, faster, and conflicts are readable ("conflict in validate_token" instead of a tree of node triples).
The other big gap: weave ships as an MCP server with 14 tools for agent coordination. Agents can claim entities before editing and detect conflicts before they merge. That's the piece mergiraf doesn't have.
On bash: weave falls back to line-level for unsupported languages, so it'll work as well as git does there.
Adding a bash tree-sitter grammar would unlock entity-level merge for it. But I can work on it tonight, if you want it urgently.
Cheers,
I haven't tried it but this sounds like it would be really valuable to me.
For diffing arbitrary files outside git, we built sem (https://github.com/ataraxy-labs/sem) which does entity-level diffs. sem diff file1.py file2.py shows you which functions changed, were added, or deleted rather than line-level changes