Remix.run Logo
madanparas 4 hours ago

The "RL repair loop" is iterative LLM prompting with stderr feedback, not reinforcement learning. There is no training code, no reward function, and no environment in the repo. The loop also freezes the scene spec and only regenerates code, so if the planner specified 12 objects that geometrically do not fit on screen, three repair attempts will not help.

yorwba 42 minutes ago | parent | next [-]

There's no training code because the author is using an external service for that https://docs.primeintellect.ai/hosted-training/getting-start... The reward function is https://github.com/HarleyCoops/Math-To-Manim/blob/d1c412d22a... The environment is iterative LLM prompting.

The idea is apparently that a model that is bad at fixing its own mistakes might become better if you train it on this task using reinforcement learning.

tptacek 4 hours ago | parent | prev [-]

Thanks, I was wondering what this README could have meant by "RL loop" here.