Remix.run Logo
d4rkp4ttern a day ago

This is consistent with how I've defined the 3 core elements of an agent:

- Intelligence (the LLM)

- Autonomy (loop)

- Tools to have "external" effects

Wrinkles that I haven't seen discussed much are:

(1) Tool-forgetting: LLM forgets to call a tool (and instead outputs plain text). Some may say that these concerns will disappear as frontier models improve, there will always be a need for having your agent scaffolding work well with weaker LLMs (cost, privacy, etc), and as long as the model is stochastic there will always be a chance of tool-forgetting.

(2) Task-completion-signaling: Determining when a task is finished. This has 2 sub-cases: (2a) we want the LLM to decide that, e.g. search with different queries until desired info found, (2b) we want to specify deterministic task completion conditions, e.g., end the task immediately after structured info extraction, or after acting on such info, or after the LLM sees the result of that action etc.

After repeatedly running into these types of issues in production agent systems, we’ve added mechanisms for these in the Langroid[1] agent framework, which has blackboard-like loop architecture that makes it easy to incorporate these.

For issue (1) we can configure an agent with a `handle_llm_no_tool` [2] set to a “nudge” that is sent back to the LLM when a non-tool response is detected (it could also be set as a lambda function to take other possible actions). As others have said, grammar-based constrained decoding is an alternative but only works for LLM-APIs that support.

For issue (2a) Langroid has a DSL[3] for specifying task termination conditions. It lets you specify patterns that trigger task termination, e.g.

- "T" to terminate immediately after a tool-call,

- "T[X]" to terminate after calling the specific tool X,

- "T,A" to terminate after a tool call, and agent handling (i.e. tool exec)

- "T,A,L" to terminate after tool call, agent handling, and LLM response to that

For (2b), in Langroid we rely on tool-calling again, i.e. the LLM must emit a specific DoneTool to signal completion. In general we find it useful to have orchestration tools for unambiguous control flow and message flow decisions by the LLM [4].

[1] Langroid https://github.com/langroid/langroid

[2] Handling non-tool LLM responses https://langroid.github.io/langroid/notes/handle-llm-no-tool...

[3] Task Termination in Langroid https://langroid.github.io/langroid/notes/task-termination/

[4] Orchestration Tools: https://langroid.github.io/langroid/reference/agent/tools/or...