For the most part, I don’t do chatbots except for a couple of RAG based chatbots. It’s more behind the scenes stuff like image understanding, categorization, nuanced sentiment analsys, semantic alignment, etc.

I’ve created a framework that lets me test the quality in automated way between prompt changes and models and I compare costs/speed/quality.

The only thing that requires humans to judge the qualify out of all those are RAG results.

▲

biophysboy 14 hours ago | parent [-]

So who is the winner using the framework you created?

▲

raw_anon_1111 14 hours ago | parent [-]

It depends. Amazon’s Nova Light gave me the best speed vs performance when I needed really quick real time inference for categorizing a users input (think call centers).

One of Anthropics models did the best with image understanding with Amazon’s Nova Pro being slightly behind.

For my tests, I used a customer’s specific set of test data.

For RAG I forgot. But is much more subjective. I just gave the customer an ability to configure the model and modify the prompt so they could choose.

	▲	biophysboy 14 hours ago \| parent [-]
		Your experience matches mine then... I haven't noticed any clear, consistent differences. I'm always looking for second opinions on this (bc I've gotten fairly cynical). Appreciate it