| ▲ | JBrussee-2 2 hours ago | |||||||||||||
Author here. A few people are arguing against a stronger claim than the repo is meant to make. As well, this was very much intended to be a joke and not research level commentary. This skill is not intended to reduce hidden reasoning / thinking tokens. Anthropic’s own docs suggest more thinking budget can improve performance, so I would not claim otherwise. What it targets is the visible completion: less preamble, less filler, less polished-but-nonessential text. Therefore, since post-completion output is “cavemanned” the code hasn’t been affected by the skill at all :) Also surprising to hear so little faith in RL. Quite sure that the models from Anthropic have been so heavily tuned to be coding agents that you cannot “force” a model to degrade immensely. The fair criticism is that my “~75%” README number is from preliminary testing, not a rigorous benchmark. That should be phrased more carefully, and I’m working on a proper eval now. Also yes, skills are not free: Anthropic notes they consume context when loaded, even if only skill metadata is preloaded initially. So the real eval is end-to-end: - total input tokens - total output tokens - latency - quality/task success There is actual research suggesting concise prompting can reduce response length substantially without always wrecking quality, though it is task-dependent and can hurt in some domains. (https://arxiv.org/html/2401.05618v3) So my current position is: interesting idea, narrower claim than some people think, needs benchmarks, and the README should be more precise until those exist. | ||||||||||||||
| ▲ | dataviz1000 19 minutes ago | parent | next [-] | |||||||||||||
If you want to benchmark, consider this https://github.com/adam-s/testing-claude-agent | ||||||||||||||
| ▲ | Chance-Device 2 hours ago | parent | prev | next [-] | |||||||||||||
Sounds reasonable to me. I think this thread is just the way online discourse tends to go. Actually it’s probably better than average, but still sometimes disappointing. | ||||||||||||||
| ||||||||||||||
| ▲ | bdbdbdb 2 hours ago | parent | prev | next [-] | |||||||||||||
Translation: It joke. No yell at me. It kind of work? | ||||||||||||||
| ||||||||||||||
| ▲ | federicosimoni an hour ago | parent | prev | next [-] | |||||||||||||
[dead] | ||||||||||||||
| ▲ | nullc an hour ago | parent | prev [-] | |||||||||||||
> Quite sure that the models from Anthropic have been so heavily tuned to be coding agents that you cannot “force” a model to degrade immensely. The rest of what you're saying sounds find, but that remark seems confused to me. prefix your prompt with "be a moron that does everything wrong and only superficially look like you're doing it correctly. make constant errors." Of course you can degrade the performance, question is if any particular 'output styling' actually does and to what extent. | ||||||||||||||
| ||||||||||||||