Remix.run Logo
spyder 8 hours ago

Great, especially that they still have an open-weight variant of this new model too. But what happened to their work on their unreleased SOTA video model? did it stop being SOTA, others got ahead, and they folded the project, or what? YT video about it: https://youtu.be/svIHNnM1Pa0?t=208 They even removed the page of that: https://bfl.ai/up-next/

liuliu 7 hours ago | parent | next [-]

As a startup, they pivoted and focused on image models (they are model providers, and image models often have more use cases than video models, not to mention they continue to have bigger image dataset moat, not video).

andersa 6 hours ago | parent | prev | next [-]

I heard a possibly unsubstantiated rumor that they had a major failed training run with the video model and canceled the project.

qoez 5 hours ago | parent [-]

Makes no sense since they should have checkpoints earlier in the run that they could restart from and they should have regular checks that keep track if a model has exploded etc.

embedding-shape 5 hours ago | parent | next [-]

I didn't read "major failed training run" as in "the process crashed and we lost all data" but more like "After spending N weeks on training, we still didn't achieve our target(s)", which could be considered "failing" as well.

observationist an hour ago | parent | prev [-]

There's always a possibility that something implicit to the early model structure causes it to explode later, even if it's a well known, otherwise stable architecture, and you do everything right. A cosmic bit flip at the start of a training run can cascade into subtle instability and eventual total failure, and part of the hard decision making they have to do includes knowing when to start over.

I'd take it with a grain of salt; these people are chainsaw jugglers and know what they're doing, so any sort of major hiccup was probably planned for. They'd have plan b and c, at a minimum, and be ready to switch - the work isn't deterministic, so you have to be ready for failures. (If you sense an imminent failure, don't grab the spinny part of the chainsaw, let it fall and move on.)

echelon 7 hours ago | parent | prev [-]

Image models are more fundamentally important at this stage than video models.

Almost all of the control in image-to-video comes through an image. And image models still needs a lot of work and innovation.

On a real physical movie set, think about all of the work that goes into setting the stage. The set dec, the makeup, the lighting, the framing, the blocking. All the work before calling "action". That's what image models do and must do in the starting frame.

We can get way more influence out of manipulating images than video. There are lots of great video models and it's highly competitive. We still have so much need on the image side.

When you do image-to-video, yes you control evolution over time. But the direction is actually lower in terms of degrees of freedom. You expect your actors or explosions to do certain reasonable things. But those 1024x1024xRGB pixels (or higher) have way more degrees of freedom.

Image models have more control surface area. You exercise control over more parameters. In video, staying on rails or certain evolutionary paths is fine. Mistakes can not just be okay, they can be welcome.

It also makes sense that most of the work and iteration goes into generating images. It's a faster workflow with more immediate feedback and productivity. Video is expensive and takes much longer. Images are where the designer or director can influence more of the outcomes with rapidity.

Image models still need way more stylistic control, pose control (not just ControlNets for limbs, but facial expressions, eyebrows, hair - everything), sets, props, consistent characters and locations and outfits. Text layout, fonts, kerning, logos, design elements, ...

We still don't have models that look as good as Midjourney. Midjourney is 100x more beautiful than anything else - it's like a magazine photoshoot or dreamy Instagram feed. But it has the most lackluster and awful control of any model. It's a 2021-era model with 2030-level aesthetics. You can't place anything where you want it, you can't reuse elements, you can't have consistent sets... But it looks amazing. Flux looks like plastic, Imagen looks cartoony, and OpenAI GPT Image looks sepia and stuck in the 90's. These models need to compete on aesthetics and control and reproducibility.

That's a lot of work. Video is a distraction from this work.

cubefox 6 hours ago | parent [-]

Hot take: text-to-image models should be biased toward photorealism. This is because if I type in "a cat playing piano", I want to see something that looks like a 100% real cat playing a 100% real piano. Because, unless specified otherwise, a "cat" is trivially something that looks like an actual cat. And a real cat looks photorealistic. Not like a painting, or cartoon, or 3D render, or some fake almost-realistic-but-cleary-wrong "AI style".

85392_school 5 hours ago | parent | next [-]

FYI: photorealism is art that imitates photos, and I see the term misused a lot both in comments and prompts (where you'll actually get subideal results if you say "photorealism" instead of describing the camera that "shot" it!)

cubefox 4 hours ago | parent [-]

I meant it here in the sense of "as indistinguishable from a photo as the model can make it".

minimaxir 5 hours ago | parent | prev [-]

As Midjourney has demonstrated, the median user of AI image generation wants those aesthetic dreamy images.

cubefox 4 hours ago | parent [-]

I think it's more likely this is just a niche that Midjourney has occupied.

loudmax 3 hours ago | parent [-]

If Midjourney is a niche, then what is the broader market for AI image generation?

Porn, obviously, though if you look at what's popular on civitai.com, a lot of it isn't photo-realistic. That might change as photo-realistic models are fully out of the uncanny valley.

Presumably personalized advertising, but this isn't something we've seen much of yet. Maybe this is about to explode into the mainstream.

Perhaps stock-photo type images for generic non-personalized advertising? This seems like a market with a lot of reach, but not much depth.

There might be demand for photos of family vacations that didn't actually happen, or removing erstwhile in-laws from family photos after a divorce. That all seems a bit creepy.

I could see some useful applications in education, like "Draw a picture to help me understand the role of RNA." But those don't need to be photo-realistic.

I'm sure people will come up with more and better uses for AI-generated images, but it's not obvious to me there will be more demand for images that are photo-realistic, rather than images that look like illustrations.

echelon 2 hours ago | parent | next [-]

> If Midjourney is a niche, then what is the broader market for AI image generation?

Midjourney is one aesthetically pleasing data point in a wide spectrum of possibilities and market solutions.

Creator economy is huge and is outgrowing Hollywood and the Music Industry combined.

There's all sorts of use cases in marketing, corporate, internal comms.

There are weird new markets. A lot of people simply subscribe to Midjourney for "art therapy" (a legit term) and use it as a social media replacement.

The giants are testing whether an infinite scroll of 100% AI content can beat human social media. Jury's out, but it might start to chip away at Instagram and TikTok.

Corporate wants certain things. Disney wants to fine tune. They're hiring companies like MoonValley to deliver tailored solutions.

Adobe is building tools for agencies and designers. They are only starting to deliver competent models (see their conference videos), and they're going about this a very different way.

ChatGPT gets the social trend. Ghibli. Sora memes.

> Porn, obviously, though if you look at what's popular on civitai.com, a lot of it isn't photo-realistic.

Civitai is circling the drain. Even before the unethical and religious Visa blacklisting, the company was unable to steer itself to a Series A. Stable Diffusion and local models are still way too hard for 99.99% of people and will never see the same growth as a Midjourney or OpenAI that have zero sharp edges and that anyone in the world can use. I'm fairly certain an "OnlyFans but AI" will arise and make billions of dollars. But it has to be so easy a tucker who doesn't learn to code can use it from their 11 year old Toshiba.

> Presumably personalized advertising, but this isn't something we've seen much of yet.

Carvana pioneered this almost five years ago. I'll try to find the link. This isn't going to really take off though. It's creepy and people hate ads. Carvana's use case was clever and endearing though.

cubefox 3 hours ago | parent | prev | next [-]

Well, as I said, if I type "cat", the most reasonable interpretation of that text string is a perfectly realistic cat.

If I want an "illustration" I can type in "illustration of a cat". Though of course that's still quite unspecific. There are countless possible unrealistic styles for pictures (e.g. line art, manga, oil painting, vector art etc), and the reasonable thing is that the users should specify which of these countless unrealistic styles they want, if they want one. If I just type in "cat" and the model gives me, say, a water color picture of a cat, it is highly improbable that this style happens to be actually what I wanted.

observationist an hour ago | parent [-]

If I want a badly drawn, salad fingers inspired scrawl of a mangy cat, it should be possible. If I want a crisp, xkcd depiction of a cat, it should capture the vibe, which might be different from a stick fighters depiction of a cat, or "what would it look like if George Washington, using microsoft paint for the first time, right after stepping out of the time machine, tried to draw a cat"

I think we'll probably need a few more hardware generations before it becomes feasible to use chatgpt 5 level models with integrated image generation. The underlying language model and its capabilities, the RL regime, and compute haven't caught up to the chat models yet, although nano-banana is certainly doing something right.

wiredpancake an hour ago | parent | prev [-]

[dead]