| ▲ | dryarzeg 5 hours ago | |
So... is this literally a... umm, sorry, I'm just genuinely (really, no sarcasm intended) which terminology to use... finetune of DeepSeek V4-Pro or post-trained version of DeepSeek V4-Pro Base? Because I haven't fully dived into the tech report (so I may update my opinion as well as my comment), but this far the architectural solutions seem to be largely similar to DeepSeek ones. Maybe I'm wrong, but that's just the first impression. EDIT: I take my words back (which happens rarely) - although they do build upon DeepSeek's work, their contribution far exceeds merely post-training the base model in a different way. They did introduce something new to the architecture, though I still can't find the full tech report, with Hugging Face and GitHub links returning 404 right now. EDIT-2: Now when I think about it, I'm not quite sure if they're going to release in the open the full report with methodology, as well as the model weights, at all. | ||
| ▲ | trollbridge 5 hours ago | parent | next [-] | |
If more people are doing what DeepSeek did and figuring it out, that's a great thing, because DeepSeek figured out how to radically reduce the cost of inference. | ||
| ▲ | BoorishBears 4 hours ago | parent | prev [-] | |
What on earth are you on about, truly. | ||