▲ | acters 9 days ago | |
Another caveat with this method is that both larger and smaller models need to behave very similar because a lot of the savings come from generating the necessary fluff around each detail such as grammar, formatting and words/letters that transition between each other. Unsurprisingly gpt-oss has both larger and smaller models that work very similarly! Both model sizes are so similar that even if getting a few wrong would not be slowing down the performance enough to equal the speed of the larger model(which is the worst case with this setup). We want the speed of the smaller model as much as possible. That is all |