Remix.run Logo
embedding-shape an hour ago

> Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.

The entire point of the comparison is that LLMs look vastly different today than before. Comparing more similar LLMs would detract from the point I thought the author was trying to make.