Remix.run Logo
jacobgold 5 hours ago

Making systems fully deterministic ignores the entire purpose of having agents involved.

IMHO the best of both worlds option is agents working with deterministic CLIs. Where the agent does the reasoning (and text generation) but uses CLIs to carry out all of the actions (issuing refunds, unblocking accounts, or whatever).

It's possible to get very reliable and consistent work out of agents when they're using well written prompts with well designed CLIs.

variety8675 5 hours ago | parent | next [-]

Isn't this how we end up with things like: https://www.reuters.com/legal/government/high-profile-meta-a...

jacobgold 4 hours ago | parent | next [-]

Yes: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

Although you can certainly do a better-and-worse job of preventing these kinds of issues.

4 hours ago | parent | prev [-]
[deleted]
bethekidyouwant 5 hours ago | parent | prev [-]

How else would anyone do something like issue a refund if not through a programmatic interface?

sqquima 5 hours ago | parent | next [-]

Direct access to the database, and create the "refund program" on the fly. Yes, stuff of nightmares.

jcgrillo 4 hours ago | parent | next [-]

yes... ha ha ha... yes!

bethekidyouwant 5 hours ago | parent | prev [-]

Right thats just head cannon though. Unless of course you believe the lies you read on the Internet.

jacobgold 5 hours ago | parent | prev [-]

At some level everything an agent does is through a "programmatic interface" (tool calls).

Some people might use skill-based scripts, MCPs, or some kind of raw access to a database. My point is that well designed CLIs are the optimal programmatic interface, for many reasons.

bethekidyouwant 5 hours ago | parent [-]

Sorry what other option is there? Is it going to create an API call from scratch every time after reading a page of documentation?

Wait raw access to the database? That’s one of the options for issuing a refund?

5 hours ago | parent | next [-]
[deleted]
cflewis 4 hours ago | parent | prev [-]

Yes, it can do.

At Big Tech Company I Work At the LLM is quite happy to make raw API calls. If it thinks the data is big, then it'll write a Python tool to do it.

The reason crafted backing CLIs are useful is you can guide the LLM towards stuff that is immediately useful rather than hoping the nondetermism can separate the wheat from the chaff.

Take CI: is it interesting to know which tests passed? Maybe, but probably not. What is really interesting is what failed. Instead of having the LLM go out and talk directly to the CI system, write an intermediate CLI that filters out less actionable stuff by default, and have a flag that'll deliver the full dump if necessary.

It's a skill to do this stuff, and it's a lot of hard won experience than something I think is easily teachable. You kind of have to feel out your model and how it "thinks" about solving problems.

And then a new model version comes out and you have to learn it all again!