Remix.run Logo
nijave 2 days ago

This is a place testing and benchmarking can definitely save you money.

It's the same as compute--you can skip testing and throw money at the problem but you're going to end up paying more.

We have some pretty basic guidelines at work and I think that's a decent starting point. They amount to a few example prompts/problem types and which OpenAI model to try using first for best bang for your buck.

I think some of it also comes down to scale. Buying a 5 pack of sledgehammers isn't a terrible value when everything comes in a "5 pack" and you only need <= 5 tools total. Or more practically, on the small end it's more economical to run general purpose models than tailor more specific models. Once you start invoking them enough, there's a break even and flip point where spending more time on the tailored or custom model is cheaper.