| ▲ | mitkebes 2 hours ago | |
All models have improved, but from my understanding, Gemini is the main one that was specifically trained on photos/video/etc in addition to text. Other models like earlier chatgpt builds would use plugins to handle anything beyond text, such as using a plugin to convert an image into text so that chatgpt could "see" it. Gemini was multimodal from the start, and is naturally better at doing tasks that involve pictures/videos/3d spatial logic/etc. The newer chatgpt models are also now multimodal, which has probably helped with their svg art as well, but I think Gemini still has an edge here | ||