▲ | mwigdahl 8 days ago | |||||||||||||
Is this just distillation but with a step to filter out low-quality responses first? | ||||||||||||||
▲ | GabrielBianconi 8 days ago | parent [-] | |||||||||||||
AFAIK, distillation typically refers to tuning on the logits of the larger model, so you wouldn't be able to do that with fine-tuning APIs (OpenAI + Google in our blog post). We fine-tune on the outputs themselves. But broadly speaking, yes, we generate data using a large model, curate the best samples using metrics from the environment, and fine-tune on that data. This isn't a novel technique from an academic perspective; our focus is on applying it to different use cases (e.g. agentic RAG, agentic tool use) and models (OpenAI, Google, Qwen). Thanks! | ||||||||||||||
|