Remix.run Logo
soupspaces 6 hours ago

Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.