| ▲ | dacox 4 hours ago | |
Yeah, I have been seeing lots of comments, tweets, etc, but given everything I have learned about these models - i do not think the change to 1M was innocuous. I'm not sure what they've claimed publicly, but I'm fairly certain they must be doing additional quantization, or at minimum additional quantization of the KV cache. Plus, sequence length can change things even when not fully utilized. I had to manually re-enable the "clear context and continue" feature as well. | ||
| ▲ | giancarlostoro 4 hours ago | parent [-] | |
I used the heck out of it when it was announced, and it felt like I was using one of the best models I've ever used, but then so were all of their other customers, I don't think they accounted for such heavy load, or maybe follow up changes goofed something up, not sure. Like I said, the 1M token, for the first few days allowed me to bust out some interesting projects in one session from nothing to "oh my" in no time. I'm thinking they should go back to all their old settings and as a user cap you at their old token limit, and ask you if you want to compact at your "soft" limit or burst for a little longer, to finish a task. | ||