Remix.run Logo
andes314 4 hours ago

Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more efficient quadratic time inference

smokel 3 hours ago | parent [-]

What about n log n?