Remix.run Logo
skybrian a day ago

I'm wondering what you did when you made it log every tool call? (I mean, that happens automatically as part of the chat transcript, but what did you do that made it catch on?)

JohnMakin a day ago | parent [-]

Yea, I was aware it stores this normally. I just wanted, at that time, to see if it could reliably record itself via writing every tool call to a file on its own (I don't know what I was trying to prove, other than mildly curious if it could be relied on to audit itself).

It said something while beginning in what it displays in its "thinking" block - I'm paraphrasing - something to the effect of, "This looks like a typical XYZ task, except I need to write down every tool call I'm using. This is good practice, it will allow the user visibility in the actions I take and ensure I am following all of the guidelines in XYZ.md."

When I removed the self-logging I was able to replicate the deviant behavior I would get during normal workflow sessions, as long as I was able to make it think it was working on a real task (and now since, I make it do real tasks pretty much always).

This was on 4.6 when there was that bad (user-reported) regression in ~March of this year. It did come up with some helpful suggestions and analysis of why certain things were breaking down, pointed out some inconsistencies in its memory files vs what its agent files said, etc. Since then I don't really rely on memories at all (at least ones where it self documents them) and use knowledge indexes instead that I help it write, has been far more reliable since.