| ▲ | prmph 5 hours ago | |
They all are. And once the context has rotted or been poisoned enough, it is unsalvageable. Claude is now actually one of the better ones at instruction following I daresay. | ||
| ▲ | XCSme 5 hours ago | parent [-] | |
In my tests it's worst with adding extra formatting or output: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu... For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only. This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage. Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema. | ||