▲ | light_hue_1 3 days ago | |
They're discovering the wrong thing. And the analogy with biology doesn't hold. They're sensitive not to architecture but to training data. That's like grouping animals by what environment they lived in, so lions and alligators are closer to one another than lions and cats. The real trick is to infer the underlying architecture and show the relationships between architectures. That's not something you can tell easily by just looking at the name of the model. And that would actually be useful. This is pretty useless. | ||
▲ | refulgentis 3 days ago | parent [-] | |
This is provocative but off-base in order to be so: why would we need to work backwards to determine architecture? Similarly, "you can tell easily by just looking at the name of the model" -- that's an unfounded assertion. No, you can't. It's perfectly cromulent, accepted, and quite regular to have a fine-tuned model that has nothing in its name indicating what it was fine-tuned on. (we can observe the effects of this even if we aren't so familiar with domain enough to know this, i.e. Meta in Llama 4 making it a requirement to have it in the name) |