Remix.run Logo
dust42 20 hours ago

That's right, they all give a variant of that, for example Qwen says: I am Qwen, a large-scale language model developed by Alibaba Cloud's Tongyi Lab.

Now given that Deepseek, Qwen and Kimi are open source models while GPT-5 is not, it is more than likely the opposite - OpenAI definitely will have a look into their models. But the other way around is not possible due to the closed nature of GPT-5.

javawizard 20 hours ago | parent [-]

> But the other way around is not possible due to the closed nature of GPT-5.

At risk of sounding glib: have you heard of distillation?

dust42 18 hours ago | parent [-]

Distilling from a closed model like GPT-4 via API would be architecturally crippled.

You’re restricted to output logits only, with no access to attention patterns, intermediate activations, or layer-wise representations which are needed for proper knowledge transfer.

Without alignment of Q/K/V matrices or hidden state spaces the student model cannot learn the teacher model's reasoning inductive biases - only its surface behavior which will likely amplify hallucinations.

In contrast, open-weight teachers enable multi-level distillation: KL on logits + MSE on hidden states + attention matching.

Does that answer your question?