The appendix lists the equations transcribed from the raw answers.

  LLM  T(t)  Cost
  Kimi K2.5 (reasoning)  20 + 52.9 exp(-t/3600)+ 27.1 exp(-t/80)  $0.01
  Gemini 3.1 Pro  20 + 53 exp(-t/2500) + 27 exp(-t/149.25)  $0.09
  GPT 5.4  20 + 54.6 exp(-t/2920) + 25.4 exp(-t/68.1)  $0.11
  Claude 4.6 Opus (reasoning)  20 + 55 exp(-t/1700) + 25 exp(-t/43)  $0.61 (eeek)
  Qwen3-235B  20 + 53.17 exp(-t/1414.43)  $0.009
  GLM-4.7 (reasoning)  20 + 53.2 exp(-t/2500)  $0.03

▲ kurthr 6 hours ago | parent [-]

It looks like a lot of them are missing something big. I'd think the two big ones are the evaporative cooling as you pour into the cup, and heating up the cup (by convection) itself. The convective cooling to the air is tertiary, but important (and conduction of the mug to the table probably isn't completely negligible). If there's only one exponential, they're definitely doing something wrong.

I'd like to see a sensitivity study to see how much those terms would need to be changed to match within a few %. Exponentials are really tweaky!

▲

andai 5 hours ago | parent [-]

Is that what that first drop is? The cold cup stealing heat from the coffee?

	▲	kadoban 5 hours ago \| parent [-]
		It's a mix of course, but I think it should be mainly that and evaporative cooling. Evap is _very_ effective but will fall off rapidly as you get away from boiling. The conduction into the mug will depend a lot on the mug material but will slow down a lot as the mug approaches the water temperature. I'd be very interested in seeing separate graphs for each major component and how they add up to the total. Even asking the LLMs to separate it out might improve some of their results, would be interesting to try that too.