It's all about managing context. The bitter lesson applies over the long haul - and yes, over the long haul, as context windows get larger or go away entirely with different architectures, this sort of thing won't be needed. But we've defined enough skills in the last month or two that if we were to put them all in CLAUDE.md, we wouldn't have any context left for coding. I can only imagine that this will be a temporary standard, but given the current state of the art, it's a helpful one.

▲

OtherShrezzing 8 hours ago | parent | next [-]

I use Claude pretty extensively on a 2.5m loc codebase, and it's pretty decent at just reading the relevant readme docs & docstrings to figure out what's what. Those docs were written for human audiences years (sometimes decades) ago.

I'm very curious to know the size & state of a codebase where skills are beneficial over just having good information hierarchy for your documentation.

	▲	pertymcpert 6 hours ago \| parent [-]
		Skills are more than code documentation. They can apply to anything that the model has to do, outside of coding.

▲

iainmerrick 5 hours ago | parent | prev | next [-]

To clarify, when I mentioned the bitter lesson I meant putting effort into organising the "skills" documentation in a very specific way (headlines, descriptions, etc).

Splitting the docs into neat modules is a good idea (for both human readers and current AIs) and will continue to be a good idea for a while at least. Getting pedantic about filenames, documentation schemas and so on is just bikeshedding.

▲

storus 8 hours ago | parent | prev | next [-]

Why not replace the context tokens on the GPU during inference when they become no longer relevant? i.e. some tool reads a 50k token document, LLM processes it, so then just flush those document tokens out of active context, rebuild QKV caches and store just some log entry in the context as "I already did this ... with this result"?

▲

killerstorm 7 hours ago | parent | next [-]

Anthropic added features like this into 4.5 release:

https://claude.com/blog/context-management

> Context editing automatically clears stale tool calls and results from within the context window when approaching token limits.

> The memory tool enables Claude to store and consult information outside the context window through a file-based system.

But it looks like nobody has it as a part of an inference loop yet: I guess it's hard to train (i.e. you need a training set which is a good match for what people use context in practice) and make inference more complicated. I guess more high-level context management is just easier to implement - and it's one of things which "GPT wrapper" companies can do, so why bother?

▲

zozbot234 8 hours ago | parent | prev [-]

This is what agent calls do under the hood, yes.

▲

storus 7 hours ago | parent [-]

I don't think so, those things happen when agent yields the control back at the end of its inference call, not during the active agent inference with multiple tool calls ongoing. These days an agent can finish the whole task with 1000s tool calls during a single inference call without yielding control back to whatever called it to do some housekeeping.

	▲	vidarh 5 hours ago \| parent [-]
		For agent, read sub-agent. E.g. the contents of your .claude/agents directory. When Claude Code spins up an agent, it provides the sub-agent with a prompt that combines the agents prompt and information composed by Claude from the outer context based on what Claude thinks needs to be communicated to the agent. Claude Code can either continue, with the sub-agent running in the background, or wait until it is complete. In either case, by default, Claude Code effectively gets to "check in" on messages from the sub-agent without seeing the whole thing (e.g. tool call results etc.), so only a small proportion of what the agent does will make it into the main agents context. So if you want to do this, the current workaround is basically to have a sub-agent carry out tasks you don't want to pollute the main context. I have lots of workflows that gets farmed out to sub-agents that then write reports to disk, and produce a summary to the main agent, who will then selectively read parts of the report instead of having to process the full source material or even the whole report.

▲

ledauphin 9 hours ago | parent | prev | next [-]

how is it different or better than maintaining an index page for your docs? Or a folder full of docs and giving Claude an instruction to `ls` the folder on startup?

	▲	d1sxeyes 8 hours ago \| parent \| next [-]
		Vercel think it isn’t: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
	▲	Avicebron 8 hours ago \| parent \| prev [-]
		It's hard to tell unless they give some hard data comparing the approaches systematically.. this feels like a grift or more charitably trying to build a presence/market around nothing. But who knows anymore, apparently saying "tell the agent to write it's own docs for reference and context continuity" is considered a revelation.

▲

stingraycharles 8 hours ago | parent | prev [-]

Not sure why you’re being downvoted so much, it’s a valid point.

It’s also related to attention — invoking a skill “now” means that the model has all the relevant information fresh in context, you’ll have much better results.

What I’m doing myself is write skills that invoke Python scripts that “inject” prompts. This way you can set up multi-turn workflows for eg codebase analysis, deep thinking, root cause analysis, etc.

Works very well.