▲ | daemonologist 10 days ago | |||||||||||||||||||||||||||||||||||||
You mention that you measured cost and latency in addition to accuracy - would you be willing to share those results as well? (I understand that for these open models they would vary between providers, but it would be useful to have an approximate baseline.) | ||||||||||||||||||||||||||||||||||||||
▲ | themanmaran 10 days ago | parent [-] | |||||||||||||||||||||||||||||||||||||
Yes, I'll add that to the writeup! You're right, initially excluded it because it was really dependent on the providers, so lots of variance. Especially with the Qwen models. High level results were: - Qwen 32b => $0.33/1000 pages => 53s/page - Qwen 72b => $0.71/1000 pages => 51s/page - Llama 90b => $8.50/1000 pages => 44s/page - Llama 11b => $0.21/1000 pages => 08s/page - Gemma 27b => $0.25/1000 pages => 22s/page - Mistral => $1.00/1000 pages => 03s/page | ||||||||||||||||||||||||||||||||||||||
|