| ▲ | Remi_Etien 5 hours ago | |
25MB is impressive. What's the tradeoff vs the 80M model — is it mainly voice quality or does it also affect pronunciation accuracy on less common words? | ||
| ▲ | rohan_joshi 4 hours ago | parent [-] | |
80M model is the highest quality while also being quite efficient. it is superior in terms of pronunciation accuracy for less common words, and also is more stable in terms of speed. its my fav model. i think the 40M is quite similar to 80M for most usecases. 15M is for resource cpus, loading onto a browser etc. The new 15M is way better than the previous 80M model(v0.1). So we're able to predictably improve the quality which is very encouraging. | ||