▲ | michaelbuckbee 3 days ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
A major current problem is that we're smashing gnats with sledgehammers via undifferentiated model use. Not every problem needs a SOTA generalist model, and as we get systems/services that are more "bundles" of different models with specific purposes I think we will see better usage graphs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | benreesman 3 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Because none of them are good enough yet to trust completely with any task. Even the absolute best ones still fart out at surprising times, and for most stuff I have an AI that's always on, it requires no cognitive overhead to delegate to my own brain. So to delegate, it has to be a reliable win: I'm not here to make AI look good, I'm here to make my own performance be good, only a sure thing is a candidate for reflexive delegation. AI companies advertise peak AI performance, users select AI tools on worst case AI fuckups: hence, only SOTA is ever in demand. TFA illustrates this well. AI will be judged on it's worst performance, just like people are fired for their worst showing, not their best. No one cares about AI performance in ideal (read: carefully contrived) settings. We care how bad it fucks up when we take our eyes off it for 2 seconds. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | empiko 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah, but the juiciest tasks are still far from solved. The amount of tasks where people are willing to accept low accuracy answers is not that high. It is maybe true for some text processing pipelines, but all the user facing use cases require good performance. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | mustyoshi 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah this is the thing people miss a lot. 7,32b models work perfectly fine for a lot of things, and run on previously high end consumer hardware. But we're still in the hype phase, people will come to their senses once the large model performance starts to plateau | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | nijave 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is a place testing and benchmarking can definitely save you money. It's the same as compute--you can skip testing and throw money at the problem but you're going to end up paying more. We have some pretty basic guidelines at work and I think that's a decent starting point. They amount to a few example prompts/problem types and which OpenAI model to try using first for best bang for your buck. I think some of it also comes down to scale. Buying a 5 pack of sledgehammers isn't a terrible value when everything comes in a "5 pack" and you only need <= 5 tools total. Or more practically, on the small end it's more economical to run general purpose models than tailor more specific models. Once you start invoking them enough, there's a break even and flip point where spending more time on the tailored or custom model is cheaper. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
[deleted] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | simonjgreen 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Completely agree. It’s worth spending time to experiment too. A reasonably simple chat support system I build recently uses 5 different models dependent on the function it it’s in. Swapping out different models for different things makes a huge difference to cost, user experience, and quality. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | alecco 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
If there was an option to have Claude Opus guide Sonnet I'd use it for most interactions. Doing it manually is a hassle and breaks the flow, so I end up using Opus too often. This shouldn't be that expensive even for large prompts since input is cheaper due to parallel processing. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | nateburke 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
generalist = fungible? In the food industry is it more profitable to sell whole cakes or just the sweetener? The article makes a great point about replit and legacy ERP systems. The generative in generative AI will not replace storage, storage is where the margins live. Unless the C in CRUD can eventually replace the R and U, with the D a no-op. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | jdietrich 2 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Isn't this just MoE? |