| ▲ | biophysboy 14 hours ago | |||||||||||||||||||||||||
Have you noticed any significant AND consistent differences between them when you switch? I frequently get a better answer from one vs the other, but it feels unpredictable. Your setup seems like a better test of this | ||||||||||||||||||||||||||
| ▲ | raw_anon_1111 14 hours ago | parent | next [-] | |||||||||||||||||||||||||
For the most part, I don’t do chatbots except for a couple of RAG based chatbots. It’s more behind the scenes stuff like image understanding, categorization, nuanced sentiment analsys, semantic alignment, etc. I’ve created a framework that lets me test the quality in automated way between prompt changes and models and I compare costs/speed/quality. The only thing that requires humans to judge the qualify out of all those are RAG results. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | kevstev 13 hours ago | parent | prev [-] | |||||||||||||||||||||||||
checkout https://poe.com - it does the same thing. I agree with your assessment though, while you can get better answers from some models than others, being able to predict in advance which model will give you the better answer is hard to predict. | ||||||||||||||||||||||||||