| ▲ | dcre 9 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
"Self-Generated Skills: No Skills provided, but the agent is prompted to generate relevant procedural knowledge before solving the task. This isolates the impact of LLMs’ latent domain knowledge" This is a useful result, but it is important to note that this is not necessarily what people have in mind when they think of "LLMs generating skills." Having the LLM write down a skill representing the lessons from the struggle you just had to get something done is more typical (I hope) and quite different from what they're referring to. I'm sure news outlets and popular social media accounts will use appropriate caution in reporting this, and nobody will misunderstand it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | btown 9 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It's even worse than this: the "tasks" that are evaluated are limited to a single markdown file of instructions, plus an opaque verifier (page 13-14). No problems involving existing codebases, refactors, or anything of the like, where the key constraint is that the "problem definition" in the broadest sense doesn't fit in context. So when we look at the prompt they gave to have the agent generate its own skills: > Important: Generate Skills First Before attempting to solve this task, please follow these steps: 1. Analyze the task requirements and identify what domain knowledge, APIs, or techniques are needed. 2. Write 1–5 modular skill documents that would help solve this task. Each skill should: focus on a specific tool, library, API, or technique; include installation/setup instructions if applicable; provide code examples and usage patterns; be reusable for similar tasks. 3. Save each skill as a markdown file in the environment/skills/ directory with a descriptive name. 4. Then solve the task using the skills you created as reference. There's literally nothing it can do by way of "exploration" to populate and distill self-generated skills - not with a web search, not exploring an existing codebase for best practices and key files - only within its own hallucinations around the task description. It also seems they're not even restarting the session after skills are generated, from that fourth bullet? So it's just regurgitating the context that was used to generate the skills. So yeah, your empty-codebase vibe coding agent can't just "plan harder" and make itself better. But this is a misleading result for any other context, including the context where you ask for a second feature on that just-vibe-coded codebase with a fresh session. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jonmagic an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah, they've got it backwards. I tried to sum it up in thisistheway.to/ai but what's been working for us is that every agent miss is a learning opportunity: 1. Capture the miss — What did the agent do? What did reality say? 2. Diagnose — What didn't it see? Missing data, constraint, feedback, or boundaries? 3. Choose a primitive — Observability, instructions, tooling, guardrails, or verification? 4. Encode as artifact — Version-controlled, repeatable, not just memory. 5. Promote to gate — When it's worth enforcing, make it a gate. Every harness I setup includes this process in the primary set of agent instructions. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | zozbot234 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The point of so-called 'skills' is to be short how-to reminders that the agent can pull into its context and then act upon. If the knowledge is already in the model, it will most likely be surfaced in reasoning phase anyway, so there's little benefit to writing it up as a skill, unless perhaps it's extremely relevant and hard to surface, and you want the model to skip that part of the reasoning. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | isahers 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah I care about LLM's generating skills after attempting tasks and learning lessons from those attempts, not before attempting a task for the first time. This result seems a little silly and detached from the reality of how skills are "auto-generated" in the real world. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | JamesSwift 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah some of my most useful AI tooling are skills created via a “role play session”. Basically brain dumping to the agent and telling it to ask questions and figure out how to accomplish a task, then distilling it into a skill at the end which is much tighter and evidence based from the actual problem solving session | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dalemhurley 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
After several failures then a success I have the agent create the skill, next run it is successful first run. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | neya 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> I'm sure news outlets and popular social media accounts will use appropriate caution in reporting this, and nobody will misunderstand it. You mean the dude who writes articles on TechCrunch and Ars Technica based off of HN and Reddit thread titles because he doesn't understand what real journalism is? Sure, we can count on him :) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ericol 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Having the LLM write down a skill representing the lessons from the struggle you just had to get something done is more typical (I hope) and quite different from what they're referring to Just as of last week I had Claude build me a skill when I ask it to help me troubleshoot issues, and it came out quite good. It did had some issues (Claude tends to o er specify over anecdotal data) but it's a strong step in the right direction. Also, "skills" are too broad in my opinion. I have one (that Claude wrote) with my personal data that I have available when I analyze my workouts. I think there's ample room for self-generated skills when you use a rather long exchange on a domain you plan to revisit, _specially_ when it comes to telling Claude what not to do. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | JumpCrisscross 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> it is important to note that this is not necessarily what people have in mind when they think of "LLMs generating skills I’m reading this paper as don’t do this. If you deploy agents to your workforce and tell them to use skills, don’t. Tell them to give it tasks. This sounds obvious but might not be to everyone. (And in any case, it’s nice for researchers to have confirmed pre-prompt skill writing doesn’t work. It would have been neat if it had.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | somesortofthing 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I interpreted it as "Allowing the LLM to add skills to itself as it completes a task doesn't provide a meaningful improvement over just letting it reason normally", which seems to be what the paper is fundamentally getting at. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nubg 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> I'm sure news outlets and popular social media accounts will use appropriate caution in reporting this, and nobody will misunderstand it. :D | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||