Remix.run Logo
andy12_ a day ago

Kind-of. You could theoretically use LoRA for this, in fact, but it probably wouldn't have enough capacity to make it a proper substitute of the attention mechanism. Instead a full MLP is trained as input chunks get processed.