Remix.run Logo
sebastiennight 2 days ago

I love that as soon as he writes,

> The plan was simple.

You know you're in for a funny read.

More seriously though, the JSON example from a vision language model is interesting but does not take into account how much extrapolation (hallucination) the model will insert over time.

For instance, even if not visible in the image, your VLM will probably start inserting details (such as the color of the team's jersey) based on knowing the team's three-letter identifier.

So the reliability of the system will go down over time, and it probably compounds if you're using some of that info to feed further steps in the loop.