Remix clone Hacker News

new | show | ask | jobs Github

	▲	hamasho 4 hours ago
		Story time. I used Python's Instructor[1], a package to force the model output to match the predefined Pydantic model. It's used like in the example below, and the output is guaranteed to fit the model. `import instructor from pydantic import BaseModel class Person(BaseModel): name: str age: int client = instructor.from_provider("openai/gpt-5-nano") person = client.create( response_model=Person, messages=[{"role": "user", "content": "Extract: John is a 30-year-old"}] ) print(person)` I defined a response model for chain of thought prompt with answers and its thinking process, then asked questions. `class MathAnswer(BaseModel): value: int reasoning: str answer = client.create( response_model=MathAnswer, messages=[{"role": "user", "content": "What's the answer to 174+1? Think step by step"}] ) print(f"answer={answer.value}, {answer.reasoning}")` This worked in most cases, but once in a while, it produced very strange results: `67, First I calculated 174=68, then I added 1 so the answer is 69` The actual implementation was much more complicated with many and complex proerties, a lot of inserted context, and long, engineered prompt, and it happened only a few times, so I took hours to figure out if it's caused by a programming bug or just LLM's randomness. Turned out, because I defined MathAnswer in that order, the model output was in the same order and it put the `reasoning` after the `answer`, so the thinking process didn't influence the answer like `{"answer": 67, "reasoning": "..."}` instead of `{"reasoning": "...", "answer": 69}`. I just changed the order of the model's properties and the problem was gone. `class MathAnswer(BaseModel): reasoning: str value: int` [1] https://python.useinstructor.com/#what-is-instructor ETA: Codex and Claude Code only said how shit my prompt and RAG system were, then suggested how to improve them, but it only made the problem worse. They really don't know how they work.