| ▲ | geocar a day ago | ||||||||||||||||
So if you get your target to record (say) 1 hour of audio, that's a one-shot. If you didn't do that (because you have 100 hours of other people talking), that's zero-shots, no? | |||||||||||||||||
| ▲ | nateb2022 a day ago | parent [-] | ||||||||||||||||
> So if you get your target to record (say) 1 hour of audio, that's a one-shot. No, that would still be zero shot. Providing inference-time context (in this case, audio) is no different than giving a prompt to an LLM. Think of it as analogous to an AGENTS.md included in a prompt. You're not retraining the model, you're simply putting the rest of the prompt into context. If you actually stopped and fine-tuned the model weights on that single clip, that would be one-shot learning. | |||||||||||||||||
| |||||||||||||||||