>so despite the name it is probably best compared with the 8B/9B
It runs much faster than a standard 8B/9B model, the name is given by the fact that it uses per-layer embedding (PLE).