| ▲ | kelseyfrog 4 days ago | ||||||||||||||||||||||
If you squint your eyes it's a fixed iteration ODE solver. I'd love to see a generalization on this and the Universal Transformer metioned re-envisioned as flow-matching/optimal transport models. | |||||||||||||||||||||||
| ▲ | kevmo314 4 days ago | parent | next [-] | ||||||||||||||||||||||
How would flow matching work? In language we have inputs and outputs but it's not clear what the intermediate points are since it's a discrete space. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | cfcf14 4 days ago | parent | prev [-] | ||||||||||||||||||||||
This makes me think it would be nice to see some kinda child of modern transformer architecture and neural ODEs. There was such interesting work a few years ago on how neural ode/pdes could be seen as a sort of continuous limit of layer depth. Maybe models could learn cool stuff if the embeddings were somehow dynamical model solutions or something. | |||||||||||||||||||||||