ScalarLM uses tokenformer adaptors by default, which have learnable key/values
https://www.scalarlm.com/blog/tokenformer-a-scalable-transfo...