Remix.run Logo
OliverGuy 8 hours ago

Edit, it looks like the paper does

TPUv5e with 16 tensor cores for 2 days for the 200M param model.

Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs