I'm not sure if I have the right mental model for a "skill". It's basically a context-management tool? Like a skill is a brief description of something, and if the model decides it wants the skill based on that description, then it pulls in the rest of whatever amorphous stuff the skill has, scripts, documents, what have you. Is this the right way to think about it?

▲ simonw 4 days ago | parent | next [-]

It's a folder with a markdown file in it plus optional additional reference files and executable scripts.

The clever part is that the markdown file has a section in it like this: https://github.com/datasette/skill/blob/a63d8a2ddac9db8225ee...

  ---
  name: datasette-plugins
  description: "Writing Datasette plugins using Python and the pluggy plugin system. Use when Claude needs to: (1) Create a new Datasette plugin, (2) Implement plugin hooks like prepare_connection, register_routes, render_cell, etc., (3) Add custom SQL functions, (4) Create custom output renderers, (5) Add authentication or permissions logic, (6) Extend Datasette's UI with menus, actions, or templates, (7) Package a plugin for distribution on PyPI"
  ---

On startup Claude Code / Codex CLI etc scan all available skills folders and extract just those descriptions into the context. Then, if you ask them to do something that's covered by a skill, they read the rest of that markdown file on demand before going ahead with the task.

▲ spike021 4 days ago | parent | next [-]

Apologies for not reading all of your blogs on this, but a follow-up question. Are models still prone to reading these and disregarding them even if they should be used for a task?

Reason I ask is because a while back I had similar sections in my CLAUDE.md and it would either acknowledge and not use or just ignore them sometimes. I'm assuming that's more of an issue of too much context and now skill-level files like this will reduce that effect?

▲ jrecyclebin 4 days ago | parent [-]

Skill descriptions get dumped in your system prompt - just like MCP tool definitions and agent descriptions before them. The more you have, the more the LLM will be unable to focus on any one piece of it. You don't want a bunch of irrelevant junk in there every time you prompt it.

Skills are nice because they offload all the detailed prompts to files that the LLM can ask for. It's getting even better with Anthropic's recent switchboard operator (tool search tool) that doesn't clutter the system prompt but tries to cut the tool list down to those the LLM will need.

▲ ithkuil 4 days ago | parent | next [-]

Can I organize skills hierarchically? If when many skills are defined, Claude Code loads all definitions into the prompt, potentially diluting its ability to identify relevant skills, I'd like a system where only broad skill group summaries load initially, with detailed descriptions loaded on-demand when Claude detects a matching skill group might be useful.

▲ simonw 3 days ago | parent [-]

There's a mechanism for that built into skills already: a skill folder can also include additional reference markdown files, and the skill can tell the coding agent to selectively read those extra files only when that information is needed on top of the skill.

There's an instruction about that in the Codex CLI skills prompt: https://simonwillison.net/2025/Dec/13/openai-codex-cli/

  If SKILL.md points to extra folders such as references/, load only the specific files needed for the request; don't bulk-load everything.

▲ ithkuil 2 days ago | parent [-]

yes but those are not quite new skills right?

can those markdown in the references also in turn tell the model to lazily load more references only if the model deems they are useful?

	▲	simonw a day ago \| parent [-]
		Yes, using regular English prompting: `If you need to write tests that mock an HTTP endpoint, also go ahead and read the pytest-mock-httpx.md file`

▲ greymalik 3 days ago | parent | prev [-]

> Anthropic's recent switchboard operator

I don’t know what this is and Google isn’t finding anything. Can you clarify?

	▲	Maxious 3 days ago \| parent [-]
		https://platform.claude.com/docs/en/agents-and-tools/tool-us... https://www.anthropic.com/engineering/advanced-tool-use talks more about the why

▲ behnamoh 4 days ago | parent | prev | next [-]

why did this simple idea take so long to become available? I remember even in llama 2 days I was doing this stuff, and that model didn't even function call.

▲

simonw 4 days ago | parent | next [-]

Skills only work if you have a full blown code execution environment with a model that can run ls and cat and execute scripts and suchlike.

The models are really good at driving those environments now which makes skills the right idea at the right time.

▲

4 days ago | parent | next [-]

[deleted]

▲

jstummbillig 4 days ago | parent | prev [-]

Why do you need code execution envs? Could the skill not just be a function over a business process, do a then b then c?

▲

steilpass 4 days ago | parent [-]

Turns out that basic shell commands are a really powerful for context management. And you get tools which run in shells for free.

But yes. Other agent platforms will adopt this pattern.

	▲	true2octave 3 days ago \| parent [-]
		I prefer to provide CLIs to my agent I find it powerful how it can leverage and self-discover the best way to use a CLI and its parameters to achieve its goals It feels more powerful than providing pre-defined set functions as MCP that will have less flexibility as a CLI

▲

NiloCK 4 days ago | parent | prev [-]

I still don't really understand `skills` as ... anything? You said yourself that you've been doing this since llama 2 days - what do you mean by "become available"?

It is useful in a user-education sense to communicate that it's good to actively document useful procedures like this, and it is likely a performance / utilization boost that the models are tuned or prompt-steered toward discovering this stuff in a conventional location.

But honestly reading about skills mostly feels like reading:

> # LLM provider has adopted a new paradigm: prompts

> What's a prompt?

> You tell the LLM what you'd like to do, and it tries to do it. OR, you could ask the LLM a question and it will answer to the best of its ability.

Obviously I'm missing something.

	▲	baq 4 days ago \| parent [-]
		It’s so simple there isn’t really more to understand. There’s a markdown doc with a summary/abstract section and a full manual section. Summary is always added to the context so the model is aware that there’s something potentially useful stored here and can look up details when it decides the moment is right. IOW it’s a context length management tool which every advanced LLM user had a version of (mine was prompt pieces for special occasions in Apple notes.)

▲ kswzzl 4 days ago | parent | prev | next [-]

> On startup Claude Code / Codex CLI etc scan all available skills folders and extract just those descriptions into the context. Then, if you ask them to do something that's covered by a skill, they read the rest of that markdown file on demand before going ahead with the task.

Maybe I still don't understand the mechanics - this happens "on startup", every time a new conversation starts? Models go through the trouble of doing ls/cat/extraction of descriptions to bring into context? If so it's happening lightning fast and I somehow don't notice.

Why not just include those descriptions within some level of system prompt?

▲ simonw 4 days ago | parent [-]

Yes, it happens on startup of a fresh Claude Code / Codex CLI session. They effectively get pasted into the system prompt.

Reading a few dozen files takes on the order of a few ms. They add enough tokens per skill to fit the metadata description, so probably less than 100 for each skill.

▲ raybb 4 days ago | parent [-]

So when it says:

> The body can contain any Markdown; it is not injected into context.

It just means it's not injected into the context until the skill is used or it's never injected into the context?

https://github.com/openai/codex/blob/main/docs/skills.md

▲ simonw 4 days ago | parent [-]

Yeah, that means that the body of that file will not be injected into the context on startup.

I had thought that once the skill is selected the whole file would be read, but it looks like that's not the case: https://github.com/openai/codex/blob/ad7b9d63c326d5c92049abd...

  1) After deciding to use a skill, open its `SKILL.md`. Read only enough to follow the workflow.

So you could have a skill file that's thousands of lines long but if the first part of the file provides an outline Codex may stop reading at that point. Maybe you could have a skill that says "see migrations section further down if you need to alter the database table schema" or similar.

▲

wahnfrieden 4 days ago | parent | next [-]

Knowing Codex, I wonder if it might just search for text in the skill file and read around matches, instead of always reading a bit from the top first.

▲

debugnik 4 days ago | parent | prev [-]

Can models actually stream the file in as they see fit, or is "read only enough" just an attention trick? I suspect the latter.

	▲	true2octave 3 days ago \| parent [-]
		Depends the agent, they can read in chunks (i.e.: 500 lines at a time)

▲ kridsdale1 4 days ago | parent | prev | next [-]

So it’s a header file. In English.

▲ throwaway314155 4 days ago | parent | prev | next [-]

Do skills get access to the current context or are they a blank slate?

	▲	simonw 4 days ago \| parent [-]
		They execute within the current context - it's more that the content of the skill gets added to that context when it is needed.

▲ leetrout 4 days ago | parent | prev [-]

Have you used AWS bedrock? I assume these get pretty affordable with prompt caching...

▲ prescriptivist 4 days ago | parent | prev | next [-]

Skills have a lot of uses, but one in particular I like is replacing one off MCP server usage. You can use (or write) an MCP server for you CI system and then add the instructions to your AGENTS.md to query the CI MCP for build results for the current branch. Then you need to find a way to distribute the MCP server so the rest of the team can use it or cook it into your dev environment setup. But all you really care about is one tool in the MCP server, the build result. Or...

You can hack together a shell, python, whatever script that fetches build results from your CI server, dumps them to stdout in a semi structured format like markdown, then add a 10-15 line SKILL.md and you have the same functionality -- the skill just executes the one-off script and reads the output. You package the skill with the script, usually in a directory in the project you are working on, but you can also distribute them as plugins (bundles) that claud code can install from a "repository", which can just be a private git repo.

It's a little UNIX-y in a way, little tools that pipe output to another tool and they are useful in a standalone context or in a chain of tools. Whereas MCP is a full blown RPC environment (that has it's uses, where appropriate).

▲

wiether 4 days ago | parent [-]

How do you manage the credentials to requests your CI server in this case? They are hardcoded in the script associated to your SKILL?

	▲	true2octave 3 days ago \| parent [-]
		Credentials are tied to the service principal of the user It’s straightforward for cloud services

▲ delaminator 4 days ago | parent | prev | next [-]

Claude Code is not very good at “remembering” its skills.

Maybe they get compacted out of the context.

But you can call upon them manually. I often do something like “using your Image Manipulation skill, make the icons from image.png”

Or “use your web design skill to create a design for the front end”

Tbh i do like that.

I also get Claude to write its own skills. “Using what we learned about from this task, write a skill document called /whatever/using your writing skills skill”

I have a GitHub template including my skills and commands, if you want to see them.

https://github.com/lawless-m/claude-skills

▲

jorl17 3 days ago | parent | next [-]

I'm so excited for the future, because _clearly_ our technology has loads to improve. Even if new models don't come out, the tooling we build upon them, and the way we use them, is sure to improve.

One particular way I can imagine this is with some sort of "multipass makeshift attention system" built on top of the mechanisms we have today. I think for sure we can store the available skills in one place and look only at the last part of the query, asking the model the question: "Given this small, self-contained bit of the conversation, do you think any of these skills is a prime candidate to be used?" or "Do you need a little bit more context to make that decision?". We then pass along that model's final answer as a suggestion to the actual model creating the answer. There is a delicate balance between "leading the model on" with imperfect information (because we cut the context), and actually "focusing it" on the task at hand, and the skill selection". Well, and, of course, there's the issue of time and cost.

I actually believe we will see several solutions make use of techniques such as this, where some model determines what the "big context" model should be focusing on as part of its larger context (in which it may get lost).

In many ways, this is similar to what modern agents already do. cursor doesn't keep files in the context: it constantly re-reads only the parts it believes are important. But I think it might be useful to keep the files in the context (so we don't make an egregious mistake) at the same time that we also find what parts of the context are more important and re-feed them to the model or highlight them somehow.

▲

Sammi 4 days ago | parent | prev [-]

I'm kinda confused about why this even is something that we need an extra feature for when it's basically already built in to the agentic development feature. I just keep a folder of md files and I add whatever one is relevant when it's relevant. It's kinda straight forward to do...

Just like you I don't edit much in these files on my own. Mostly just ask the model to update an md file whenever I think we've figured out something new, so the learning sticks. I have files for test writing, backend route writing, db migration writing, frontend component writing etc. Whenever a section gets too big to live in agents.md it gets it's own file.

	▲	jorl17 3 days ago \| parent \| next [-]
		Because the concept of skills is not tied to code development :) Of course if that's what you're talking about, you are already very close to the "interface" that skills are presented in, and they are obvious (and perhaps not so useful) But think of your dad or grandma using a generic agent, and simply selecting that they want to have certain skills available to it. Don't even think of it as a chat interface. This is just some option that they set in their phone assistant app. Or, rather, it may be that they actually selected "Determine the best skills based on context", and the assistant has "skill packs" which it periodically determines it needs to enable based on key moments in the conversation or latest interactions. These are all workarounds for the problems of learning, memory...and, ultimately, limited context. But they for sure will be extremely useful.
	▲	delaminator 3 days ago \| parent \| prev [-]
		It’s a formalisation of the method, and it’s in your global ~/.claude and also per project. I have mine in a GitHub template so I can even use them in Claude Code for the web. And synchronise them across my various machine (which is about 6 machines atm).

▲ marwamc 4 days ago | parent | prev | next [-]

My understanding is this: A skill is made up of SKILL.md which is what tells claude how and when to use this skill. I'm a bit of a control freak so I'll usually explicitly direct claude to "load the wireframe-skill" and then do X.

Now SKILL.md can have references to more finegrained behaviors or capabilities of our skill. My skills generally tend to have a reference/{workflows,tools,standards,testing-guide,routing,api-integration}.md. These references are what then gets "progressively loaded" into the context.

Say I asked claude to use the wireframe-skill to create profileView mockup. While creating the wireframe, claude will need to figure out what API endpoints are available/relevant for the profileView and the response types etc. It's at this point that claude reads the references/api-integration.md file from the wireframe skill.

After a while I found I didn't like the progressive loading so I usually direct claude to load all references in the skill before proceeding - this usually takes up maybe 20k to 30k tokens, but the accuracy and precision (imagined or otherwise ha!) is worth it for my use cases.

▲

kxrm 4 days ago | parent | next [-]

> I'm a bit of a control freak so I'll usually explicitly direct claude to "load the wireframe-skill" and then do X.

You shouldn't do this, it's generally considered bad practice.

You should be optimizing your skill description. Often times if I am working with Claude Code and it doesn't load I skill, I ask it why it missed the skill. It will guide me to improving the skill description so that it is picked up properly next time.

This iteration on skill description has allowed skills to stay out of context until they are needed rather predictably for me so far.

	▲	adastra22 4 days ago \| parent \| next [-]
		There are different ways to use the tool. If you chat with the model, you want it to naturally pick the right tool to use based on vibes and context so you don’t have to repeat yourself. If you are plugging a call it Claude code within a larger, structured workflow, you want the tool selection to be deterministic.
	▲	rane 3 days ago \| parent \| prev [-]
		It's not enough. Sometimes skills just randomly won't be invoked.

▲

chrisweekly 3 days ago | parent | prev [-]

My understanding is that use of "description" frontmatter is essential, bc Claude Code can read just the description without loading the entire file into context.

▲ taytus 4 days ago | parent | prev | next [-]

Easy, let me try to explain: You want to achieve X, so you ask your AI companion, "How do I do X?" Your companion thinks and tries a couple of things, and they eventually work. So you say, "You know what, next time, instead of figuring it out, just do this"... that is a skill. A recipe for how to do things.

▲ jmalicki 4 days ago | parent | prev | next [-]

Yes. I find these very useful for enforcing e.g. skills like debugging, committing code, make prs, responding to pr feedback from ai review agents, etc. without constantly polluting the context window.

So when it's time to commit, make sure you run these checks, write a good commit message, etc.

Debugging is especially useful since AI agents can often go off the rails and go into loops rewriting code - so it's in a skill I can push for "read the log messages. Inserting some more useful debug assertions to isolate the failure. Write some more unit tests that are more specific." Etc.

▲ canadiantim 4 days ago | parent | prev [-]

I think it’s also important to think of skills in the context of tasks, so when you want an agent to perform a specialized task, then this is the context, the resources and scripts it needs to perform the task.

	▲	hadlock 4 days ago \| parent [-]
		I'm excited to use this with the Ghidra cli mode to rapidly decompile physics engines from various games. Do I want my flight simulator to behave like the Cessna like in flight simulator 3.0 in the air? Codex can already do that. Do I want the plane to handle like Yoshi from Mario Kart 64 when taxiing? It hasn't been done yet but Claude code is apparently pretty good at pulling apart n64 roms so that seems within the realm of possibility.