Remix clone Hacker News

new | show | ask | jobs Github

	▲	nateb2022 a day ago
		> That's not what happens in zero-shot voice cloning It is exactly what happens. You are confusing the task (classification vs. generation) with the learning paradigm (zero-shot). In the voice cloning context, the class is the speaker's voice (not observed during training), samples of which are generated by the machine learning model. The definition applies 1:1. During inference, it is predicting the conditional probability distribution of audio samples that belong to that unseen class. It is "predict[ing] the class that they belong to," which very same class was "not observed during training." You're getting hung up on the semantics.
	▲	woodson a day ago \| parent [-]
		Jeez, OP asked what it means in this context (zero-shot voice cloning), where you quoted a generic definition copied from Wikipedia. I defined it concretely for this context. Don't take it as a slight, there is no need to get all argumentative.