| ▲ | hamasho 4 hours ago | |
Story time. I used Python's Instructor[1], a package to force the model output to match the predefined Pydantic model. It's used like in the example below, and the output is guaranteed to fit the model.
I defined a response model for chain of thought prompt with answers and its thinking process, then asked questions.
This worked in most cases, but once in a while, it produced very strange results:
The actual implementation was much more complicated with many and complex proerties, a lot of inserted context, and long, engineered prompt, and it happened only a few times, so I took hours to figure out if it's caused by a programming bug or just LLM's randomness.Turned out, because I defined MathAnswer in that order, the model output was in the same order and it put the `reasoning` after the `answer`, so the thinking process didn't influence the answer like `{"answer": 67, "reasoning": "..."}` instead of `{"reasoning": "...", "answer": 69}`. I just changed the order of the model's properties and the problem was gone.
[1] https://python.useinstructor.com/#what-is-instructorETA: Codex and Claude Code only said how shit my prompt and RAG system were, then suggested how to improve them, but it only made the problem worse. They really don't know how they work. | ||