| ▲ | embedding-shape 2 days ago | |||||||||||||||||||||||||
Sounds like you're mixing and trying to measure two very different things, but placing them in the same category. One is the model itself, then there are reference conditions, and no such thing as "API failure". The other one is the reliability and uptime of a remote API endpoint for LLM inference. If you want to measure their API, do so, but don't place it under the same category as testing the model itself, as they're two different metrics. | ||||||||||||||||||||||||||
| ▲ | XCSme 2 days ago | parent | next [-] | |||||||||||||||||||||||||
But how would you test a closed model independent of their API? For example, the speed score (tokens/s) is also variable and changes over time. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | a day ago | parent | prev [-] | |||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||