Remix.run Logo
nayroclade 12 hours ago

Is the approach fundamentally limited to smaller models? Or could you theoretically train a model as powerful as the largest models, but much faster?