In my experience, while Gemini does really well in benchmarks I find it much worse when I actually use the model. It's too verbose / doesn't follow instructions very well. Let's see if that changes with this model.