| ▲ | javawizard 20 hours ago | |
> But the other way around is not possible due to the closed nature of GPT-5. At risk of sounding glib: have you heard of distillation? | ||
| ▲ | dust42 18 hours ago | parent [-] | |
Distilling from a closed model like GPT-4 via API would be architecturally crippled. You’re restricted to output logits only, with no access to attention patterns, intermediate activations, or layer-wise representations which are needed for proper knowledge transfer. Without alignment of Q/K/V matrices or hidden state spaces the student model cannot learn the teacher model's reasoning inductive biases - only its surface behavior which will likely amplify hallucinations. In contrast, open-weight teachers enable multi-level distillation: KL on logits + MSE on hidden states + attention matching. Does that answer your question? | ||