Remix.run Logo
storus 2 hours ago

> to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked

That's likely coming from the 3:1 ratio of linear to quadratic attention usage. The latest DeepSeek also suffers from it which the original R1 never exhibited.