Remix.run Logo
GodelNumbering 6 hours ago

Interesting things Dirac does:

1. Uses an optimized version of Hash-Anchored edits for file editing (https://dirac.run/posts/hash-anchors-myers-diff-single-token)

2. Utilizes language's AST to decide what to fetch into context, entirely avoids large code file reads

3. Batches all operations. Does large number of reads/edits simultaneously (you can see a video demo for deepseek-v4-flash here https://www.reddit.com/r/LocalLLaMA/comments/1suhdki/tested_...)

4. Allows the model to execute code to analyze things on the fly, so the model can simply write bash/python/perl script to accomplish things where appropriate

5. A lot of context curation and opportunistic context updates, i.e. put into context anything that you are certain model would ask next

deskamess 5 hours ago | parent | next [-]

I always wondered why AST's were not more of a part in both editing and scoping of changes/parsing code. I thought I read an article where they said 'grep' was just as effective. It kinda made sense for the case they were talking about.

GodelNumbering 5 hours ago | parent | next [-]

Grep is effective for the most part, except for situations like when you have huge codebases and the thing you're looking for is used in too many places both as symbol and non-symbol.

Another annoying thing about plain grep is, LLMs often end up pulling in bundled packages when using grep where 1 line is large enough to ruin the context window

embedding-shape 5 hours ago | parent [-]

> Grep is effective for the most part

It's very effective in well-written and well-designed code bases where concepts tend to be relatively well formed to not be named the same as everything else, so grepping for symbols give you good search results.

Projects where the god-object or core concepts are generic names like "Tree", "Node" or other things that are used everywhere, tends to be short of impossible to search with grep and friends.

sigbottle 2 hours ago | parent | prev [-]

It's not intuitive to humans, even after learning parsing theory. I can do basic name refactorings. I've even written neovim plugins to do 1 specific thing with the AST (dfs down and delete one subtree which I understand). Those are fine.

I would not be comfortable doing an on-the-fly "rewrite all subtrees that match this pattern" kind of edit.

It seems like a tool that's good for LLM's though.

spullara 20 minutes ago | parent [-]

"rewrite all subtrees that match this pattern" works really well in jetbrains, they call it structure search-and-replace.

messh 3 hours ago | parent | prev | next [-]

Anchor based editing requires injecting new anchors to the context, and dirac does so via a diff. So how is this more efficient (token-wise) than search and replace?? Even at a single token per hash. Also, code is read more than written so these just add up. I experimented once with stable anchors, albeit longer than a single token, and found it a downgrade.

My conclusion is that the efficiency dirac sees comes mainly from showing file skeleton by default

hedgehog 34 minutes ago | parent [-]

I'm not sure one way or another but I've been using a related tool called Tilth by another poster here. It doesn't do anchor-based editing, but it does do syntax-aware search and will e.g. report the line range for function definitions, provide file outlines with line numbers on a file name match, etc.

https://github.com/jahala/tilth

rgbrgb 3 hours ago | parent | prev | next [-]

It would be really cool to do a causality investigation to determine which one of these boosts it so much / quantify how much each matters. Who knows, they may all interact in a sum-is-greater-than-parts way that only improves the score when shipped altogether.

sally_glance 3 hours ago | parent | prev | next [-]

Is there a complete list of the tools somewhere? I'm interested in how you chose to expose the AST specifically. In my own harness attempts I wanted to keep the number of tools absolutely minimal and briefly experimented with including an AST lib to use via an execute_python tool (plus some examples in the system prompt). Results were mixed though, with most models preferring ripgrep.

UncleOxidant 4 hours ago | parent | prev | next [-]

> Utilizes language's AST to decide what to fetch into context,

Does that mean that it's only going to work with certain langauges for which it has parsers available?

GodelNumbering 3 hours ago | parent | next [-]

It uses tree-sitter wasms. Currently, 14 languages are available (https://github.com/dirac-run/dirac/tree/master/src/services/...)

The agent would work even without a language parser, just that the AST-based functionalities won't work

gavinray 3 hours ago | parent | prev [-]

Yes

drakythe an hour ago | parent | prev | next [-]

How are the two token anchors chosen when the initial 1700 single token anchors run out? I'm assuming just a 2 word combination from the 1700.

jbellis 3 hours ago | parent | prev | next [-]

> Batches all operations. Does large number of reads/edits simultaneously...

I wasn't sure what this meant, so I looked at the source. It seems to be referring to tool APIs being designed around taking multiple targets as a list parameter, instead of hoping the model makes appropriately parallel tool calls. (This matches my experience btw, models are reluctant to make a large number of parallel calls simultaneously, and this seems more pronounced with weaker models.)

verdverm 3 hours ago | parent [-]

I think Anthropic may have mentioned this first, this pattern is also something my custom agent's tools are designed around, pretty sure I picked it up from them.

blurbleblurble 4 hours ago | parent | prev | next [-]

Did you consider incorporating ast-grep or gritql?

Congratulations, great work.

sally_glance 3 hours ago | parent | next [-]

Can't speak for OP but I tried providing ast-grep in the execution context of an execute_bash tool, but even with pretty aggressive steering most models just don't seem to use it a lot. More expensive/SOTA models or higher reasoning increases the chances but lowers speed and raises cost. Maybe due to training bias for exploration tasks?

blurbleblurble 3 hours ago | parent [-]

Yes, I've tried this passive approach too and didn't dig much further after that. I thought maybe they'd figured out something more intentional in the prompting to enable these kinds of approaches.

sally_glance 3 hours ago | parent [-]

I have a hunch model proficiency for a given CLI tool very much correlates with how many StackOverflow answers and blog entries providing examples for it there are...

blurbleblurble 3 hours ago | parent [-]

My sense is that we're at a tipping point where instruction following is getting good enough to disrupt these old habits

GodelNumbering 2 hours ago | parent | prev [-]

Not really, but interested in trying them out for a future version, especially gritql.

tripleee 5 hours ago | parent | prev [-]

[flagged]