Remix.run Logo
madars 6 hours ago

Neat! As a human one can recognize that this question embeds a computation and use a standard trick - explicitly ask the LLM to use a program to generate the answer. LLM's are great at generating code but not necessarily that great at executing it "in head" (e.g., "what's the numeric integral of foo?" vs "write a Python program that computes foo"). Some instances of this are noticed by models themselves (I guess by now all know that they are bad calculators so would whip out code to do multiplication) but still a lot of them remain. Concretely, Claude 3.7 with "How is ahnentafel number 67 related to me? Use a program to help you." gets to "your father's father's father's father's mother's mother", whereas without the hint it indeed trips up in arithmetic and logic errors.