Remix.run Logo
dragonwriter 2 days ago

> For images and videos I guess each character, creature, idea in it is a token?

No, for images, tokens would, I expect, usually be asymptotically proportional to the area of the image (this is certainly the case with input token for OpenAIs models that take image inputs; outputs are more opaque); you probably won’t have a neat one-to-one intuition for what one token represents, but you don’t need that for it to be useful and straightforward for understanding pricing, since the mathematical relationship of tokens to size can be published and the size of the image is a known quantity. (And videos conceptually could be like images with an additional dimension.)