Remix.run Logo
rocqua 5 hours ago

When you put an LLM in reasoning mode, it will approximately have a conversation with itself. This mimics an inner monologue.

That conversation is held in text, not in any internal representation. That text is called the reasoning trace. You can then analyse that trace.

bandrami 5 hours ago | parent [-]

Unless things have changed drastically in the last 4 months (the last time I looked at it) those traces are not stored but reconstructed when asked. Which is still the same problem.

ehsanu1 5 hours ago | parent | next [-]

They aren't necessarily "stored" but they are part of the response content. They are referred to as reasoning or thinking blocks. The big 3 model makers all have this in their APIs, typically in an encrypted form.

Reconstruction of reasoning from scratch can happen in some legacy APIs like the OpenAI chat completions API, which doesn't support passing reasoning blocks around. They specifically recommend folks to use their newer esponses API to improve both accuracy and latency (reusing existing reasoning).

tibbar 5 hours ago | parent | prev [-]

For a typical coding agent, there are intermediate tool call outputs and LLM commentary produced while it works on a task and passed to the LLM as context for follow up requests. (Hence the term agent: it is an LLM call in a loop.) You can easily see this with e.g. Claude Code, as it keeps track of how much space is left in the context and requires "context compaction" after the context gradually fills up over the course of a session.

In this regard, the reasoning trace of an agent is trivially accessible to clients, unlike the reasoning trace of an individual LLM API call; it's a higher level of abstraction. Indeed, I implemented an agent just the other day which took advantage of this. The OP that you originally replied to was discussing an agentic coding process, not an individual LLM API call.

bandrami 3 hours ago | parent [-]

Well, right, I see those reasoning stages in reasoning models with Ollama and if you ask it what its reasoning was after the fact what it says is different than what it said at the time.