| ▲ | geor9e 6 hours ago | ||||||||||||||||
If the trick were genuinely useful, and was well circulated months ago, the resource-starved inference providers would have squeezed this trick dry already, instead of wasting 60% of their tokens, waiting for users to implement it themselves in 5 minutes of effort. | |||||||||||||||||
| ▲ | Klathmon 3 hours ago | parent | next [-] | ||||||||||||||||
That's like saying quantization isn't real because the frontier labs aren't using it in their production inference. This is a lossy process, it produces worse results. It might be worth it for some situations, but applying it to everything would just be making your SOTA model worse | |||||||||||||||||
| |||||||||||||||||
| ▲ | solenoid0937 5 hours ago | parent | prev [-] | ||||||||||||||||
[flagged] | |||||||||||||||||