Remix clone Hacker News

new | show | ask | jobs Github

	▲	DoctorOetker 2 days ago
		pruning and distilling are 2 totally different things. pruning: discarding low weight connections after training, makes the network sparser but also less regular (complications for memory layout, and compute kernels to access the sparse network weights). distilling: take a large pretrained model, and train a smaller one from it, for example consider a cloze task (fill the blanked token in a sentence), then compute the probabilities using the large model, and train the smaller model to reproduce the same probabilities distilling is a form of fitting into a smaller regular network, of potentially totally different architecture, while pruning is a form of discarding low weight coefficients resulting in a sparser network.
	▲	wizardforhire a day ago \| parent [-]
		Thanks for taking the time to clarify for me.