| ▲ | Terr_ 11 hours ago | |||||||||||||||||||
> generating a 10-second AI video costs roughly 160 times more than generating an equivalent amount of text Hold up, "equivalent" how? It can't be based on "cost" of generation, or else it would be a 1x factor, by definition. Perhaps "costs" in this case refer to the unprofitable gap between revenues and expenses? > Table 2 Weird, so it looks like some person just arbitrarily decided that 1K GPT-4 text tokens "is equivalent to" 10s of Sora 2 video? That doesn't seem very rigorous. | ||||||||||||||||||||
| ▲ | motbus3 10 hours ago | parent | next [-] | |||||||||||||||||||
Let me type and think (I put it in Gemini for English translation) The 1080p and most expensive tier is 0.70 USD per second. Since Sora 2 runs at 30 FPS, each second of video costs roughly 2.3c per frame. While a single 1920x1080 static image is 765 tokens, video models use spacetime compression. Instead of a raw 22,950 tokens per second (765 tokens x 30 frames), a second of 1080p video equates to roughly 10,000 'latent tokens' due to temporal redundancy. Adding 20 tokens per second of audio, we get roughly 10,020 tokens per second of output. At $0.70 per second for ~10,020 tokens, the cost is approximately $0.00007 per token for Sora 2. 10 seconds of Sora 2 video would cost $7.00 for roughly 100,200 tokens. In comparison, GPT-5.4-pro at 15 USD per 1M output tokens costs $0.000015 per token. To generate 100,200 tokens of text, it would cost only $1.50. This puts Sora 2 at roughly 4.6x more expensive than GPT-5.4-pro per token generated. However, if we ignore video compression and treat every frame as a unique 1080p image (765 tokens each), Sora 2 becomes roughly 30x more expensive in terms of raw computational effort per frame | ||||||||||||||||||||
| ▲ | trillic 11 hours ago | parent | prev | next [-] | |||||||||||||||||||
It's a well known fact that 1 Picture == 1000 words. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | PaulHoule 11 hours ago | parent | prev | next [-] | |||||||||||||||||||
Well I guess you could say there is some amount of text that entertains you as much as a 10s Sora video. Judged in terms of time a fast reader might read 50 words in 10s and that is what, 100 tokens? If somebody wants to fudge that up by a factor of 10 (picture is worth a thousand words or something) you get where they are. Now personally I am not entertained by motion-for-the-sake-of-motion Instagram reels, they actually make me queasy despite having a cast iron stomach and having taught myself to not get sick in VR. So if that's 10s of entertainment, leave me out. I don't care if Tom Cruise is whaling on Brad Pitt or the other way around for that matter, but boy do I want to see the body thetans burst ouf of Cruise's body when OTIII goes horribly wrong. My reaction to the article was funny. I mean, I saw that 160x thing and thought it was bogus, and of course it is all AI generated and poorly formatted to boot but I did like the overall message. It does remind me of the early 2010s when a lot of sites with photo-based content (including mine) were going out of business because the revenue wasn't enough to pay the hosting costs and a few newcomers like Instagram were survivors and Google was obviously cleaning up with video on YouTube. From the viewpoint of business models for AI video I think there are two questions: (i) how many times can you get people to watch the same video, i mean, no matter how expensive it is, if you get enough views/ad impressions/other revenue you are OK (ii) how does it compete with some other way to generate the video? The picture that the $20 subscription costs $65 to serve doesn't sound too crazy to me. I mean, there might be somebody who can get 3x the value out of a 10s Sora video than somebody else or they could get the cost down by a factor of 1/3. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | Aedelon 11 hours ago | parent | prev [-] | |||||||||||||||||||
[dead] | ||||||||||||||||||||