Remix.run Logo
ph4rsikal 2 days ago

It might appear so, but then you could validate it with a simple test. If the LLM would play a 4x4 Tic Tac Toe game, would the agent select the winning move 100% of all time or block a losing move 100% of the time? If these systems were capable of proper reasoning, then they would find the right choice in these obvious but constantly changing scenarios without being specifically trained for it.

[1] https://jdsemrau.substack.com/p/nemotron-vs-qwen-game-theory...