Remix.run Logo
aseligman 19 hours ago

Some additional context: many real world agent use cases struggle to balance quality, cost, and performance. This technique can help avoid the tradeoffs that quantization techniques introduce, including unpredictable results while you try cost optimize an agent. In some cases the cost savings can be significant using dfloat11 as you squeeze into more affordable GPUs.

* I work with xmad.ai