▲ | mvieira38 8 days ago | |
Related: There was buzz last year about Kolmogorov Arnold Networks, and https://arxiv.org/abs/2409.10594 was claiming KANs perform better than standard MLPs in the transformer architecture. Does anyone know of these being explored in the LLM space? KANs seem to have better properties regarding memory if I'm not mistaken. |