Remix.run Logo
0xbadcafebee 2 hours ago

When LLMs write bash one-liners today, it often leads to errors. There's a large range of possible functionality, different versions, extra abstractions, uncertain errors, varying functionality, lack of types and schemas. The CLI is kinda like a language, but much more abstract; this confuses the LLM. Imagine if the English language changed as often and as widely as a CLI program's arguments, options, outputs can across versions, platforms. On the other hand, if the LLM writes python instead of bash, it often leads to more reliable results for the same task, since it varies less frequently, is more specific, can be syntax-checked, has standard metadata, more expressive logic, etc. But there's also a lot of useful functionality exposed by applications that doesn't exist in many libraries, so there are limits.

We do need more tools for the AI to turn our requests into deterministic, reliable, correct results. But this isn't a DSL thing, it's more like a pipeline of steps to get from A to Z. This will likely require multiple bidirectional passes, to confirm with the human along each step, and fix and re-do the pipeline when a mistake is found. You could encode the final result in some kind of DSL, but it'd only be useful as a read-only artifact; if you change a line of it, without extensive testing in an immutable environment, it introduces bugs. We need to lean more into reliability with LLMs since they are so fallible.