| ▲ | bennydog224 5 days ago | |
From the article, speed & cost match 2.5 Flash. I'm working on a project where there's a huge gap between 2.5 Flash and 2.5 Flash Lite as far as performance and cost goes. -> 2.5 Flash Lite is super fast & cheap (~1-1.5s inference), but poor quality responses. -> 2.5 Flash gives high quality responses, but fairly expensive & slow (5-7s inference) I really just need an in-between for Flash and Flash Lite for cost and performance. Right now, users have to wait up to 7s for a quality response. | ||