I have a dumb performance question.

Why when asking a model to change text in a minor way; are we not asking it to generate the operational transformations necessary to modify the text, and then just executing the ot on the existing text vs reproducing every token? Maybe tools are doing that more than I realize?

▲

XYen0n 4 hours ago | parent | next [-]

The only thing a model can output is tokens; to achieve this, a tool of converting tokens into operational transformations is required. For example, I have an ast-grep skill, it will instruct the model to generate ast-grep rules and run ast-grep to perform file modifications.

	▲	basch 3 hours ago \| parent [-]
		I am saying to directly output the operational transformation instructions as the tokens. You’re essentially telling it to “write the diff” and then applying the patch. [retain(8), delete(6), insert("very very"), retain(10)]

▲

sigmoid10 3 hours ago | parent | prev | next [-]

The simple answer is: because it is not necessary to achieve the same final output. Most LLMs today are trained as autoregressive token predictors. They fundamentally can't work any other way. But we know how to train them really well and they have many applications beyond editing text. Diffusion LLMs exist too, which work a bit closer to what you describe, but they are not yet at the same level of intelligence since training methods are not that mature and they are generally less flexible as well.

▲

basch 3 hours ago | parent [-]

So predict the tokens of the operational transformation.

I just asked: Write the operational transformation sequence and command to turn “this is really beautiful” to “this is very very beautiful”

and in return got: You can map this out by moving a virtual cursor across the text and telling it what to keep, remove, or add. You start by retaining the first eight characters to keep "this is " untouched. Then you delete the next six characters to remove the word "really". In that exact spot, you insert the nine characters for "very very". You finish the operation by retaining the final ten characters, which preserves the space and the word "beautiful". You can code this specific command sequence as [retain(8), delete(6), insert("very very"), retain(10)].

In a large paragraph of text I would expect it to be way quicker and cheaper to generate “[retain(800), delete(6), insert("very very"), retain(10000)]” than repredict the entire remainder of the unedited text.

▲

sigmoid10 2 hours ago | parent [-]

Sounds easy, but isn't in practice. You can look at the edit text file tool in va code copilot for example to see how complicated that can get: https://github.com/microsoft/vscode-copilot-chat/tree/9e668c...

	▲	basch 2 hours ago \| parent [-]
		I have no idea when I’m being lied to anymore but allegedly Aider and Cursor work the way I described, although cursor is using a second model to apply the edit.

▲

jfim 2 hours ago | parent | prev | next [-]

I've seen Claude use sed to edit files on other hosts instead of copying the file back and forth to edit it. Not quite full blown OT but it's going in that direction.

▲

cryptoz 4 hours ago | parent | prev [-]

This is the approach I take with code edits to existing files at Code+=AI; I wrote a blog post with a simple example of AST modification to illustrate: https://codeplusequalsai.com/static/blog/prompting_llms_to_m...