| ▲ | alecco 9 hours ago | |
Related interesting find on Qwen. "Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":" | ||