Remix.run Logo
rimmontrieu 4 days ago

Impressive examples but for GenAI it always comes down to the fact that you have to cherry pick the best result after so many fail attempts. Right now, it feels like they're pushing the narrative that ExpectedOutput = LLM(Prompt, Input) when it's actually ExpectedOutput = LLM(Prompt, Input) * Takes where Takes can vary from 1 to 100 or more

antiraza 2 days ago | parent | next [-]

Why is that a bad thing, or even a non-expected thing? If you pick up a paintbrush, you don't always nail each stroke on the canvas -- just because it's programmatic doesn't mean it should be like a calculator.

LLMs and image generators are cross pollinating human language and human visual information -- both really fuzzy mediums.

I think learning how to 'use this instrument' and 'finding the perfect brush stroke' are part of how they are supposed to work (at least in their current form). I also don't know that just because they are showing good outputs from the inputs that this is framing the narrative as one-and-done... I think the rest of the owl is kind in of implied.

raincole 4 days ago | parent | prev | next [-]

ML researchers have been used Top-5 accuracy for a quite long time, especially when it comes to computer vision.

Of course it's a ridiculous index in most use cases (like in self-driving car. Your 4th guess is that you need to brake? Cool...). But somehow people in ML normalized it.

vunderba 4 days ago | parent | prev [-]

That's why I always record the number of rolls it takes to get to an acceptable result on my GenAI Comparison site for each model - it's a broad metric indicating how much you have to fight to steer the model in the right direction.