| ▲ | dchftcs an hour ago | |
I suspect this will be a significant problem blocking long-horizon tasks in practice, basically the more turns there are, the larger the chance the classifier produces a false positive. The disappointment of the user will also scale with the length of the task, as you're in the middle of some complex thing and now gets derailed, after already have paid for many tokens. | ||