Remix.run Logo
djsjajah 8 hours ago

No. It seems to me that the comment is objectively incorrect. The original comment was talking about inference and from what I can tell, it is strictly going to run slower than the model trained to the same loss without this approach (it has "minimal overhead"). The main point is that you wont need to train that model for as long.