| ▲ | supern0va 2 hours ago | |
>The cost of failure at scale is too high for a major to just take a new architecture/mechanism and implement it, Is it, though? This scrappy startup was able to take a large(-ish) open weights model and adapt it. Why can't the frontier labs do the same cost effectively? >If they want to get acquired, then they should show that they know what they're doing. I'm sure they would do so under an appropriate NDA as part of negotiations. I'm not sure why you think a full public disclosure is necessary. | ||
| ▲ | cmogni1 40 minutes ago | parent [-] | |
I don't mean to be shady, but there are plenty of details that they did release that show that they don't know what they're doing. They make comparisons to FlashAttention-2 when FlashAttention-4 has been out (even if they wanted to stick to Hopper class GPUs for whatever reason there's still FlashAttention-3). The two orders of magnitude claim look like they're for prefill not next-token decoding, which is a bit duplicitous. Long context extrapolation experiments typically go well beyond 2x context length. Etc etc etc. I never said they should have a full public disclosure, but I do think sharing something of substance helps build trust and also get people excited. Lastly, frontier labs have other incentives than to eek out every dollar and cent. Having the most capable models, not the most cost effective, is of significantly higher priority as OpenAI and Anthropic march towards IPOs. The same is not necessarily true for Google/DeepMind, and one can see from their public releases alone for some of their open weight models that this may be more of a priority for them today. | ||