Counter-point: I built an agent that can only interface with Kakoune, a much less common and more challenging situation for an LLM to find itself in, and Gemma4-A4B 8bit quantized does remarkably better in actually figuring out how to get text in buffers than Qwen3.6-35B-A3B in a similar class as Gemma4 A4B.

Now, is this the usual use case? No, it's a benchmark I created specifically in order to put LLMs in situations where they can't just blast out their bash commands without having to interface with something else and adapt.

▲

celrod 2 days ago | parent [-]

Fellow kakoune user here. I'm curious about your use case/ what you're doing with it!

▲

59nadir 2 days ago | parent [-]

I'm just messing around with building agents, that's all. I'm not super interested in making ones that just sit in a terminal executing shell scripts because truth be told they're absolutely trivial to make and don't show any interesting parts of LLMs, whereas telling an agent that they are sitting in Kakoune is a whole lot more interesting and really shows a lot of what LLMs aren't great at, and how they'll have to fight their urge to spit out overwrought bash invocations or at the very least find a way to fit those into something new.

So far the only tools the agent has access to are `evaluate_commands(commands=["...", "..."])` and `get_buffer_contents()`, which really makes them have to work for doing things. I could make it super easy for them but then it wouldn't be an interesting experiment.

	▲	59nadir a day ago \| parent [-]
		As an addendum to this: If I were to try to make something more useful out of this, I'd probably add the ability for LLMs to list buffers, probably give them an easier out for executing shell scripts in the way they prefer, give them an easier time to list docs and a few other things like that. The tools and the interaction with Kakoune is really trivial to write; I already use this by having the agent write to the session FIFO (a very simple binary format) and I extract information via my own FIFO that Kakoune writes to (this is used for the buffer data only right now). I think once you started using it more as a tool and not a pseudo-benchmark like I am you'd probably think of even more things to add but a lot of it comes down to just making Kakoune's state visible and making shell spam (which the LLMs love) easier.