Does anyone understand why LLMs have gotten so good at this? Their ability to generate accurate SVG shapes seems to greatly outshine what I would expect, given their mediocre spatial understanding in other contexts.

▲

tedsanders 2 hours ago | parent | next [-]

A few thoughts:

- One thing to be aware of is that LLMs can be much smarter than their ability to articulate that intelligence in words. For example, GPT-3.5 Turbo was beastly at chess (1800 elo?) when prompted to complete PGN transcripts, but if you asked it questions in chat, its knowledge was abysmal. LLMs don't generalize as well as humans, and sometimes they can have the ability to do tasks without the ability to articulate things that feel essential to the tasks (like answering whether the bicycle is facing left or right).

- Secondly, what has made AI labs so bullish on future progress over the past few years is that they see how little work it takes to get their results. Often, if an LLM sucks at something that's because no one worked on it (not always, of course). If you directly train a skill, you can see giant leaps in ability with fairly small effort. Big leaps in SVG creation could be coming from relatively small targeted efforts, where none existed before.

▲

emp17344 32 minutes ago | parent [-]

We’re literally at the point where trillions of dollars have been invested in these things and the surrounding harnesses and architecture, and they still can’t do economically useful work on their own. You’re way too bullish here.

	▲	dbeardsl 9 minutes ago \| parent [-]
		Neither do cars until very recently. A tool doesn't have to be unsupervised to be useful.

▲

simonw 3 hours ago | parent | prev | next [-]

My best guess is that the labs put a lot of work into HTML and CSS spatial stuff because web frontend is such an important application of the models, and those improvements leaked through to SVG as well.

▲

mitkebes an hour ago | parent | prev | next [-]

All models have improved, but from my understanding, Gemini is the main one that was specifically trained on photos/video/etc in addition to text. Other models like earlier chatgpt builds would use plugins to handle anything beyond text, such as using a plugin to convert an image into text so that chatgpt could "see" it.

Gemini was multimodal from the start, and is naturally better at doing tasks that involve pictures/videos/3d spatial logic/etc.

The newer chatgpt models are also now multimodal, which has probably helped with their svg art as well, but I think Gemini still has an edge here

▲

pknerd 3 hours ago | parent | prev | next [-]

> Does anyone understand why LLMs have gotten so good at this?

Added more IF/THEN/ELSE conditions.

	▲	kridsdale3 3 hours ago \| parent [-]
		More wires and jumpers on the breadboard.

▲

2 hours ago | parent | prev [-]

[deleted]