Remix.run Logo
minimaxir 3 hours ago

We really need a replacement for all-MiniLM-L12-v2 that can create more robust embeddings with the same compute.

You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX.

2 hours ago | parent | next [-]
[deleted]
electroglyph 2 hours ago | parent | prev [-]

it's a pain in the ass to do properly.

what we really need it something like auto-round for ONNX