| ▲ | storus 2 hours ago | |
> to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked That's likely coming from the 3:1 ratio of linear to quadratic attention usage. The latest DeepSeek also suffers from it which the original R1 never exhibited. | ||