▲ | DoctorOetker 2 days ago | |
pruning and distilling are 2 totally different things. pruning: discarding low weight connections after training, makes the network sparser but also less regular (complications for memory layout, and compute kernels to access the sparse network weights). distilling: take a large pretrained model, and train a smaller one from it, for example consider a cloze task (fill the blanked token in a sentence), then compute the probabilities using the large model, and train the smaller model to reproduce the same probabilities distilling is a form of fitting into a smaller regular network, of potentially totally different architecture, while pruning is a form of discarding low weight coefficients resulting in a sparser network. | ||
▲ | wizardforhire a day ago | parent [-] | |
Thanks for taking the time to clarify for me. |