| ▲ | jchandra 2 days ago | |
In this prototype, OLS + SVD isn’t per-token, it runs only when the recycle bin fills (amortized over multiple tokens). That said, it’s still heavier than Top-K. I haven’t benchmarked end-to-end latency yet; this is mainly exploring the accuracy vs memory tradeoff. | ||