Reminds me of the terminus agent/harness on the terminal-bench coding benchmark - they just send send keystrokes to a tmux session. They score pretty well.
https://www.tbench.ai/news/terminus