Remix.run Logo
stavros a day ago

To clarify, the parent here didn't actually give the model a way to run the commands. The model just wrote the script/command and then, being unable to run anything, just mentally calculated what the result would probably be (and got it wrong).

Yes the answer was wrong, but so was the setup (the model should have had access to a command runner tool).

neonstatic a day ago | parent | next [-]

Yes, you are right that for a model that wants to use tools, the environment was wrong. I didn't do that on purpose. I was simply interested in seeing what the answer to my question would be. The fact Gemma 4 wanted to use tools was a bit of a surprise to me - the Qwen model also can use tools, but it opted not to.

I think it is interesting to see, that when forced to derive the value on its own, Gemma gets it wrong while Qwen gets it right (although in a very costly way).

I also think that not using tools is better than hallucinating using them.

stavros a day ago | parent [-]

I'm not judging, just clarifying for others who might think that the model did actually run the tools (like I did initially).

notnullorvoid 19 hours ago | parent | prev [-]

Regardless of setup the LLM shouldn't hallucinate tool use.