| ▲ | fheinsen 2 hours ago | |
Yes, there must be a connection. While adaptive truncation may prove impractical, it should be possible to measure spectral statistics on sample data, and specify a different fixed truncation order per layer, per head, etc. The github repository lists many other possible improvements: https://github.com/glassroom/sata_attention#proof-of-concept | ||