Remix clone Hacker News

new | show | ask | jobs Github

	▲	trcf22 4 days ago
		Great job! Would it be possible to know what was the cost of training such a model?
	▲	menaerus 3 days ago \| parent [-]
		From their report: > Once a production environment has been set up, we estimate that the model can be realistically trained in approximately 90 days on 4096 GPUs, accounting for overheads. If we assume 560 W power usage per Grace-Hopper module in this period, below the set power limit of 660 W, we can estimate 5 GWh power usage for the compute of the pretraining run.