It mostly doesn't, at 9M it has very limited capacity. The whole idea of this project is to demonstrate how Language Models work.