| ▲ | darrenf 8 hours ago | |
> > The performance/intelligence is said to be about the same as the geometric mean of the total and active parameter counts. So, this model should be equivalent to a dense model with about 10.25 billion parameters. > Sorry, how did you calculate the 10.25B? The geometric mean of two numbers is the square root of their product. Square root of 105 (35*3) is ~10.25. | ||