| ▲ | storus 3 hours ago | |
Based on the current DeepSeek website I suspect it's not going to be great as their current model (V3.4? V4-mini?) often forgets or changes facts explicitly mentioned in the conversation which R1 never did. It's better than R1 at math or coding, but nearly unusable for deep conversation. I suspect they pushed MLA or linear attention too much, or quantize a lot more than before. | ||