▲ | jjani 5 days ago | |
It appears to be overtuned on extremy strict instruction following, interpreting things in a very unhuman way, which may be a benefit to agentic tasks at the costs of everything else. My limited API testing with gpt-5 also showed this. As an example, the instruction "don't use academic language" caused it to basically omit half of what it output without that instruction. The other frontier models, and even open source Chinese ones like Kimi and Deepseek, understand perfectly fine what we mean by it. | ||
▲ | int_19h 5 days ago | parent [-] | |
It's not great at agentic tasks either. Not the least because it seems very timid about doing things on its own, and demands (not asks - demands) that user confirm every tiny step. |