Remix.run Logo
prmph 5 hours ago

They all are. And once the context has rotted or been poisoned enough, it is unsalvageable.

Claude is now actually one of the better ones at instruction following I daresay.

XCSme 5 hours ago | parent [-]

In my tests it's worst with adding extra formatting or output: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only.

This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage.

Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema.