| ▲ | bastawhiz 6 hours ago | |
That doesn't make any sense. If the cost is training, the cost is compute, not data. Distilling another model means paying for that model, which isn't obviously less expensive than crawling the web and curating a dataset. Regardless of where your data comes from, the training process costs the same. | ||
| ▲ | what 3 hours ago | parent [-] | |
> paying for that model They probably just used 100000000 free plans. | ||