| ▲ | JKCalhoun 8 hours ago | |
"…whereas 35A3B is a lot smarter…" Must. Parse. Is this a 35 billion parameter model that needs only 3 billion parameters to be active? (Trying to keep up with this stuff.) EDIT: A later comment seems to clarify: "It's a MoE model and the A3B stands for 3 Billion active parameters…" | ||