| ▲ | blintz 8 hours ago | ||||||||||||||||
I was most surprised by the fact that it only took 40 examples for a Qwen finetune to match the style and quality of (interactively tuned) Nano Banana. Certainly the end result does not look like the stock output of open-source image generation models. I wonder if for almost any bulk inference / generation task, it will generally be dramatically cheaper to (use fancy expensive model to generate examples, perhaps interactively with refinements) -> (fine tune smaller open-source model) -> (run bulk task). | |||||||||||||||||
| ▲ | cannoneyed 8 hours ago | parent [-] | ||||||||||||||||
In my experience image models are very "thirsty" and can often learn the overall style of an image from far fewer models. Even Qwen is a HUGE model relatively speaking. Interestingly enough, the model could NOT learn how to reliably generate trees or water no matter how much data and/or strategies I threw at it... This to me is the big failure mode of fine-tuning - it's practically impossible to understand what will work well and what won't and why | |||||||||||||||||
| |||||||||||||||||