Remix clone Hacker News

new | show | ask | jobs Github

	▲	dragonwriter 2 days ago
		No, it'll certainly be more expensive in any conceivable model that handles all three modalities, but if the model uses an architecture like current autoregressive, token-based multimodal LLMs/VLMs, tokens will make just as much sense as the basis for pricing, and be similarly straightforward, as with text and images.