Remix.run Logo
ImHereToVote 8 hours ago

I wonder how much GPU compute you would need to create a public domain version of this. This would be a really valuable for the general public.

wongarsu 5 hours ago | parent [-]

To get a single knowledge-cutoff they spent 16.5h wall-clock hours on a cluster of 128 NVIDIA GH200 GPUs (or 2100 GPU-hours), plus some minor amount of time for finetuning. The prerelease_notes.md in the repo is a great description on how one would achieve that

IanCal 5 hours ago | parent [-]

While I know there's going to be a lot of complications in this, given a quick search it seems like these GPUs are ~$2/hr, so $4000-4500 if you don't just have access to a cluster. I don't know how important the cluster is here, whether you need some minimal number of those for the training (and it would take more than 128x longer or not be possible on a single machine) or if a cluster of 128 GPUs is a bunch less efficient but faster. A 4B model feels like it'd be fine on one to two of those GPUs?

Also of course this is for one training run, if you need to experiment you'd need to do that more.