Remix.run Logo
trueno 2 days ago

when you say "same prompt" are you saying its similar prompt and something in the middle determines that "this is basically the same question" or is it looking for someone who for whatever reason prompted, then copied and pasted that prompt and prompted it again word for word?

kaliades 2 days ago | parent [-]

Exact match, word for word. agent-cache takes everything that defines an LLM request - which model you're calling (gpt-4o, Claude, etc.), the full conversation history (system prompt + user messages + assistant responses), sampling parameters like temperature, and any tool/function definitions the model has access to - serializes it all into a canonical JSON string with sorted keys, and hashes it with SHA-256. That hash is the cache key in Valkey. Same inputs down to the last character = cache hit, anything different = miss.

If you want the 'basically the same question' behavior, that's our other package - @betterdb/semantic-cache. It embeds the prompt as a vector and does similarity search, so 'What is the capital of France?' and 'Capital city of France?' both hit. The trade-off is it needs valkey-search for the vector index, while agent-cache works on completely vanilla Valkey with no modules.

In practice, agent-cache hits its cache less often than semantic-cache would, but when it does hit, you know the result is correct - there's no chance of returning a response for a question that was similar but not actually the same.

traceseal a day ago | parent | next [-]

Do you automate the same prompt twice and if you already burn the tokens to get the first answer what the second answer gain do you record it to try and teach your agents for future to look locally or?

kaliades a day ago | parent [-]

[dead]

ivanvmoreno a day ago | parent | prev [-]

[dead]