Remix.run Logo
mentalgear 2 hours ago

I think therein lies another fun benchmark to show that LLM don't generalize: ask the llm to solve the same logic riddle, only in different languages. If it can solve it in some languages, but not in others, it's a strong argument for just straightforward memorization and next token prediction vs true generalization capabilities.