Remix.run Logo
dragonwriter 2 days ago

No, it'll certainly be more expensive in any conceivable model that handles all three modalities, but if the model uses an architecture like current autoregressive, token-based multimodal LLMs/VLMs, tokens will make just as much sense as the basis for pricing, and be similarly straightforward, as with text and images.