| ▲ | KVarN: Native vLLM KV-cache quantization back end by Huawei(github.com) | ||||||||||||||||
| 46 points by theanonymousone 2 hours ago | 6 comments | |||||||||||||||||
| ▲ | throwa356262 an hour ago | parent | next [-] | ||||||||||||||||
Better performance than TQ and better quality than FP16? Am I reading this right?? | |||||||||||||||||
| |||||||||||||||||
| ▲ | v3ss0n an hour ago | parent | prev [-] | ||||||||||||||||
Why this is not a PR for vLLM ? | |||||||||||||||||
| |||||||||||||||||