| ▲ | minimaxir 3 hours ago | |
We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the same compute. You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX. | ||
| ▲ | 2 hours ago | parent | next [-] | |
| [deleted] | ||
| ▲ | electroglyph 2 hours ago | parent | prev [-] | |
it's a pain in the ass to do properly. what we really need it something like auto-round for ONNX | ||