Remix.run Logo
epolanski 3 days ago

> From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.

Please provide the examples, both of the problem and your input so we can double check.