Remix.run Logo
btown 12 hours ago

It seems the benchmarks here are heavily biased towards single-shot explanatory tasks, not agentic loops where code is generated: https://github.com/drona23/claude-token-efficient/blob/main/...

And I think this raises a really important question. When you're deep into a project that's iterating on a live codebase, does Claude's default verbosity, where it's allowed to expound on why it's doing what it's doing when it's writing massive files, allow the session to remain more coherent and focused as context size grows? And in doing so, does it save overall tokens by making better, more grounded decisions?

The original link here has one rule that says: "No redundant context. Do not repeat information already established in the session." To me, I want more of that. That's goal-oriented quasi-reasoning tokens that I do want it to emit, visualize, and use, that very possibly keep it from getting "lost in the sauce."

By all means, use this in environments where output tokens are expensive, and you're processing lots of data in parallel. But I'm not sure there's good data on this approach being effective for agentic coding.

sillysaurusx 12 hours ago | parent | next [-]

I wrote a skill called /handoff. Whenever a session is nearing a compaction limit or has served its usefulness, it generates and commits a markdown file explaining everything it did or talked about. It’s called /handoff because you do it before a compaction. (“Isn’t that what compaction is for?” Yes, but those go away. This is like a permanent record of compacted sessions.)

I don’t know if it helps maintain long term coherency, but my sessions do occasionally reference those docs. More than that, it’s an excellent “daily report” type system where you can give visibility to your manager (and your future self) on what you did and why.

Point being, it might be better to distill that long term cohesion into a verbose markdown file, so that you and your future sessions can read it as needed. A lot of the context is trying stuff and figuring out the problem to solve, which can be documented much more concisely than wanting it to fill up your context window.

EDIT: Someone asked for installation steps, so I posted it here: https://news.ycombinator.com/item?id=47581936

dataviz1000 12 hours ago | parent | next [-]

Did you call it '/handoff' or did Claude name it that? The reason I'm asking is because I noticed a pattern with Claude subtly influencing me. For example, the first time I heard the the word 'gate' was from Claude and 1 week later I hear it everywhere including on Hacker News. I didn't use the word 'handoff' but Claude creates handoff files also [0]. I was thinking about this all day. Because Claude didn't just use the word 'gate' it created an entire system around it that includes handoffs that I'm starting to see everywhere. This might mean Claude is very quietly leading and influencing us in a direction.

[0] https://github.com/search?q=repo%3Aadam-s%2Fintercept%20hand...

sillysaurusx 12 hours ago | parent | next [-]

I was reading through the Claude docs and it was talking about common patterns to preserve context across sessions. One pattern was a "handoff file", which they explained like "have claude save a summary of the current session into a handoff file, start a new session, then tell it to read the file."

That sounded like a nice idea, so I made it effortless beyond typing /handoff.

The generated docs turned out to be really handy for me personally, so I kept using it, and committed them into my project as they're generated.

dataviz1000 12 hours ago | parent [-]

Oh, so the word 'gate' is probably in the documentation also!

I see. So this isn't as scary. Claude is helping me understand how to use it properly.

perching_aix a few seconds ago | parent | next [-]

I have a tough time navigating what swings this topic between scary and not scary for you.

Unless you're a believer of souls, free will, and other spiritualistic hogwash, it should be clear that everything you read (and in general, experience) biases you. LLM output is no different.

nerdsniper an hour ago | parent | prev | next [-]

I have noticed similar phenomena with Claude, where its vocabulary subtly shifts how I think/frame/write about things or points me to subtle gaps in my own understanding. And I also usually come around to understand that it's often not arbitrary. But I do think some confirmation bias is at play: when it tries to shift me into the wrong directions repeatedly, I learn how to make it stop doing that.

It definitely adds a layer of cognitive load, in wrangling/shepherding/accomodating/accepting the unpredictable personalities and stochastic behaviors of the agents. It has strong default behaviors for certain small tasks, and where humans would eventually habituate prescribed procedures/requirements, the LLM's never really internalize my preferences. In that way, they are more like contractors than employees.

airstrike 11 hours ago | parent | prev [-]

Why would it be scary? Claude is just parroting other human knowledge. It has no goal or agency.

adrianN 9 hours ago | parent | next [-]

You can’t verify that there is no influence by the makers of Claude.

airstrike 4 minutes ago | parent [-]

[delayed]

fwipsy 10 hours ago | parent | prev [-]

By that logic, nothing computers do is scary.

OJFord 6 hours ago | parent | next [-]

Yes I think that is their argument.

20 minutes ago | parent [-]
[deleted]
rendx 6 hours ago | parent | prev [-]

Computer don't do anything.

perching_aix 22 minutes ago | parent [-]

What's their value then?

jstanley 5 hours ago | parent | prev | next [-]

FWIW I have worked with people using the word "gate" for years.

For example, "let's gate the new logic behind a feature flag".

ProofHouse 8 hours ago | parent | prev | next [-]

They all are. This is proven in research. https://medium.com/data-science-collective/the-ai-hivemind-p...

creamyhorror 6 hours ago | parent | prev [-]

I've started saying "gate" and "bound(ed)" and "handoff" a lot (and even "seam" and "key off" sometimes) since Codex keeps using the terms. They're useful, no doubt, but AI definitely seems to prefer using them.

flashgordon 10 hours ago | parent | prev | next [-]

I've actually been doing this for a year. I call it /checkpoint instead and it does some thing like:

* update our architecture.md and other key md files in folders affected by updates and learnings in this session. * update claude.md with changes in workflows/tooling/conventions (not project summaries) * commit

It's been pretty good so far. Nothing fancy. Recently I also asked to keep memories within the repo itself instead of in ~/.claude.

Only downside is it is slow but keeps enough to pass the baton. May be "handoff" would have been a better name!

chermi 11 hours ago | parent | prev | next [-]

Did the same. Although I'm considering a pipeline where sessions are periodically translated to .md with most tool outputs and other junk stripped and using that as source to query against for context. I am testing out a semi-continuous ingestion of it in to my rag/knowledge db.

david_allison 12 hours ago | parent | prev | next [-]

Is this available online? I'd love documentation of my prompts.

sillysaurusx 12 hours ago | parent [-]

I’ll post it here, one minute.

Ok, here you go: https://gist.github.com/shawwn/56d9f2e3f8f662825c977e6e5d0bf...

Installation steps:

- In your project, download https://gist.github.com/shawwn/56d9f2e3f8f662825c977e6e5d0bf... into .claude/commands/handoff.md

- In your project's CLAUDE.md file, put "Read `docs/agents/handoff/*.md` for context."

Usage:

- Whenever you've finished a feature, done a coherent "thing", or otherwise want to document all the stuff that's in your current session, type /handoff. It'll generate a file named e.g. docs/agents/handoff/2026-03-30-001-whatever-you-did.md. It'll ask you if you like the name, and you can say "yes" or "yes, and make sure you go into detail about X" or whatever else you want the handoff to specifically include info about.

- Optionally, type "/rename 2026-03-23-001-whatever-you-did" into claude, followed by "/exit" and then "claude" to re-open a fresh session. (You can resume the previous session with "claude 2026-03-23-001-whatever-you-did". On the other hand, I've never actually needed to resume a previous session, so you could just ignore this step entirely; just /exit then type claude.)

Here's an example so you can see why I like the system. I was working on a little blockchain visualizer. At the end of the session I typed /handoff, and this was the result:

- docs/agents/handoff/2026-03-24-001-brownie-viz-graph-interactivity.md: https://gist.github.com/shawwn/29ed856d020a0131830aec6b3bc29...

The filename convention stuff was just personal preference. You can tell it to store the docs however you want to. I just like date-prefixed names because it gives a nice history of what I've done. https://github.com/user-attachments/assets/5a79b929-49ee-461...

Try to do a /handoff before your conversation gets compacted, not after. The whole point is to be a permanent record of key decisions from your session. Claude's compaction theoretically preserves all of these details, so /handoff will still work after a compaction, but it might not be as detailed as it otherwise would have been.

creamyhorror 6 hours ago | parent | next [-]

I already do this manually each time I finish some work/investigation (I literally just say

"write a summary handoff md in ./planning for a fresh convo"

and it's generally good enough), but maybe a skill like you've done would save some typing, hmm

My ./planning directory is getting pretty big, though!

addandsubtract 6 hours ago | parent | prev | next [-]

Thanks! The last link is broken, though, or maybe you didn't mean to include it? Also, if you've never actually resumed a session, do you use these docs at some other time? Do you reference them when working on a related feature, or just keep them for keepsake to track what you've done and why?

david_allison 10 hours ago | parent | prev | next [-]

Oh wow, thank you so much!!!!!

cruffle_duffle 9 hours ago | parent | prev [-]

Thanks!!!

mlrtime 4 hours ago | parent | prev | next [-]

Wouldn't the next phase of this be automatic handoffs executed with hooks?

Your system is great and I do similar, my problem is I have a bunch of sessions and forget to 'handoff'.

The clawbots handle this automatically with journals to save knowledge/memory.

dominotw 2 hours ago | parent [-]

when work on task i have task/{name}.md that write a running log to. is this not a common workflow?

DeathArrow 9 hours ago | parent | prev [-]

I think Cursor does something similar under the hood.

alsetmusic 9 hours ago | parent | prev | next [-]

> No explaining what you are about to do. Just do it.

Came here for the same reason.

I can't calculate how many times this exact section of Claude output let me know that it was doing the wrong thing so I could abort and refine my prompt.

8 hours ago | parent [-]
[deleted]
hatmanstack 12 hours ago | parent | prev | next [-]

Seems crazy to me people aren't already including rules to prevent useless language in their system/project lvl CLAUDE.md.

As far as redundancy...it's quite useful according to recent research. Pulled from Gemini 3.1 "two main paradigms: generating redundant reasoning paths (self-consistency) and aggregating outputs from redundant models (ensembling)." Both have fresh papers written about their benefits.

wongarsu 5 hours ago | parent | next [-]

There was also that one paper that had very noticeable benchmark improvements in non-thinking models by just writing the prompt twice. The same paper remarked how thinking models often repeat the relevant parts of the prompt, achieving the same effect.

Claude is already pretty light on flourishes in its answers, at least compared to most other SotA models. And for everything else it's not at all obvious to me which parts are useless. And benchmarking it is hard (as evidenced by this thread). I'd rather spend my time on something else

whattheheckheck 10 hours ago | parent | prev [-]

No such thing as junk DNA kinda applies here

scosman 12 hours ago | parent | prev | next [-]

also: inference time scaling. Generating more tokens when getting to an answer helps produce better answers.

Not all extra tokens help, but optimizing for minimal length when the model was RL'd on task performance seems detrimental.

joquarky 6 hours ago | parent [-]

I liked playing with the completion models (davinci 2/3). It was a challenge to arrange a scenario for it to complete in a way that gave me the information I wanted.

That was how I realized why the chat interfaces like to start with all that seemingly unnecessary/redundant text.

It basically seeds a document/dialogue for it to complete, so if you make it start out terse, then it will be less likely to get the right nuance for the rest of the inference.

dataviz1000 7 hours ago | parent | prev | next [-]

I made a test [0] which runs several different configurations against coding tasks from easy to hard. There is a test which it has to pass. Because of temperature, the number of tokens per one shot vary widely with all the different configurations include this one. However, across 30 tests, this does perform worse.

[0] https://github.com/adam-s/testing-claude-agent

baq 5 hours ago | parent | prev | next [-]

if the model gets dumber as its context window is filled, any way of compressing the context in a lossless fashion should give a multiplicative gain in the 50% METR horizon on your tasks as you'll simply get more done before the collapse. (at least in the spherical cow^Wtask model, anyway.)

9 hours ago | parent | prev | next [-]
[deleted]
heyethan 8 hours ago | parent | prev [-]

[dead]