Remix.run Logo
ImPostingOnHN 14 hours ago

> Providing inference-time context (in this case, audio) is no different than giving a prompt to an LLM.

Right... And you have 0-shot prompts ("give me a list of animals"), 1-shot prompts ("give me a list of animals, for example: a cat"), 2-shot prompts ("give me a list of animals, for example: a cat; a dog"), etc.

The "shot" refers to how many examples are provided to the LLM in the prompt, and have nothing to do with training or tuning, in every context I've ever seen.

nateb2022 12 hours ago | parent [-]

> Right... And you have 0-shot prompts ("give me a list of animals"), 1-shot prompts ("give me a list of animals, for example: a cat"), 2-shot prompts ("give me a list of animals, for example: a cat; a dog"), etc.

> The "shot" refers to how many examples are provided to the LLM in the prompt, and have nothing to do with training or tuning, in every context I've ever seen.

In formal ML, "shot" refers to the number of samples available for a specific class during the training phase. You're describing a colloquial usage of the term found only in prompt engineering.

You can't apply an LLMism to a voice cloning model where standard ML definitions apply.