| ▲ | nateb2022 a day ago | |
Providing inference-time context (in this case, audio) is no different than giving a prompt to an LLM. Think of it as analogous to an AGENTS.md included in a prompt. You're not retraining the model, you're simply putting the rest of the prompt into context. If you actually stopped and fine-tuned the model weights on that single clip, that would be one-shot learning. | ||
| ▲ | coder543 a day ago | parent [-] | |
To me, a closer analogy is In Context Learning. In the olden days of 2023, you didn’t just find instruct-tuned models sitting on every shelf. You could use a base model that has only undergone pretraining and can only generate text continuations based on the input it receives. If you provided the model with several examples of a question followed by an answer, and then provided a new question followed by a blank for the next answer, the model understood from the context that it needed to answer the question. This is the most primitive use of ICL, and a very basic way to achieve limited instruction following behavior. With this few-shot example, I would call that few-shot ICL. Not zero shot, even though the model weights are locked. But, I am learning that it is technically called zero shot, and I will accept this, even if I think it is a confusingly named concept. | ||