Remix.run Logo
solid_fuel 13 hours ago

"Agents" can't think and LLMs aren't sentient. They aren't suited to be your coworker, but they also aren't suited for generation computational tasks. The chat interface is all that there is and their behavior in chat is not deterministic or bounded enough to be useful in most applications. They mimic tokens in reply to the tokens you give them, and that is all.

You know what's a bad idea from an engineering (that thinky thing we used to do as part of building software) perspective?

Building a dependency on an expensive remote API into your system.

This isn't just me bloviating, I've been down this road before. In my case I had a project using LLMs to automatically edit videos provided by Hollywood content owners. It seemed like a decent application, but LLMs are structurally unsuited for dealing with user data like this. The way that the prompt is evaluated means there is no separation between system and user input, so once you start dealing with a wide variety of topics you pretty quickly run into walls.

One example - ChatGPT refusing to summarize and pick a top segment from a news program because it contained references to a murder-suicide, and both murder and suicide are included in the many prohibited topics that are filtered in ChatGPT replies. This was through their API, not the regular user interface, so it is in theory as unrestricted as access gets. But because the LLM cannot be trusted to behave properly around the topic, they have to filter anything which touches it.

Structurally, I don't see a way this can be overcome - LLMs by design mix the entire prompt together, it's not like a parameterized SQL query where you can isolate the user and system data. That means that a long or bold enough user input is often enough to outweigh the system prompt, and that causes the LLM to veer into unpredictable territory.