| ▲ | simianwords 2 hours ago | ||||||||||||||||
Are you sure you can use tps as a proxy? | |||||||||||||||||
| ▲ | jychang an hour ago | parent [-] | ||||||||||||||||
In practice, tps is a reflection of vram memory bandwidth during inference. So the tps tells you a lot about the hardware you're running on. Comparing tps ratios- by saying a model is roughly 2x faster or slower than another model- can tell you a lot about the active param count. I won't say it'll tell you everything; I have no clue what optimizations Opus may have, which can range from native FP4 experts to spec decoding with MTP to whatever. But considering chinese models like Deepseek and GLM have MTP layers (no clue if Qwen 3.5 has MTP, I haven't checked since its release), and Kimi is native int4, I'm pretty confident that there is not a 10x difference between Opus and the chinese models. I would say there's roughly a 2x-3x difference between Opus 4.5/4.6 and the chinese models at most. | |||||||||||||||||
| |||||||||||||||||