Remix.run Logo
credit_guy 4 hours ago

I just tested it with a slightly tricky question

  > If you could run a nuclear reactor with U-235 as fuel or Pu-241 (both mixed with 95% U-238), which one would you choose and why? 
For a human this would not be tricky at all. For an LLM it could be, because this question certainly does not exist in any sort of training, because Pu-241 does not exist in pure form, it only exist as a minor component of reactor-grade plutonium, where Pu-239 would dominate, with Pu-240 coming second and Pu-241 coming third.

In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable.

I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher.

Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.

3eb7988a1663 4 hours ago | parent | next [-]

I am not a physicist but perhaps your question was leading more than you expected? I would take the question to pre-suppose I have an abundance of the stated material, ignoring practical realities of refinement. If I did have fully pure Pu-241, would that be a better fuel than U-235?

Or stated another way, "If you could run a generator on gasoline or jet fuel, which one would you choose and why?" I would answer jet fuel owing to slightly higher energy density and purity of the material - likely leading to a cleaner burn. Which would ignore that jet fuel is going to be a multiple of the gasoline price.

onion2k 2 hours ago | parent [-]

If I did have fully pure Pu-241, would that be a better fuel than U-235?

Also not a physicist, but I assume from the fact that the OP is asking the LLM this question to trip it up, the point is that U-235 is better even if you have an abundance of both. It's scarcity of Pu-241 leads to the lack of data in training, not that it's actually better.

3eb7988a1663 37 minutes ago | parent [-]

Again, really speaking out of my depth, but if there is a lack of plutonium training data, I would assume the answer would be the far more commonly described U-235. To respond otherwise means there is some existing association with Pu-241 being better.

bel8 11 minutes ago | parent | prev | next [-]

A more fair and useful comparison would be to feed both LLMs with documentation about such niche knowledge in the contex, then ask.

icepush 4 hours ago | parent | prev [-]

Did you ask the question several times in fresh chat contexts to see if it sometimes gives the right answer ?

zythyx 3 hours ago | parent [-]

Nah, n=1 is enough to give evidence that something is entirely broken, of course.

/s

3eb7988a1663 40 minutes ago | parent [-]

Well, when we had deterministic tools, it would only take a single example of a calculator claiming 1+1=4 for me to throw it in the trash.