| ▲ | gchamonlive 3 hours ago | ||||||||||||||||||||||
How should one conduct such a rigourously reproducible experiment when LLMs by nature aren't deterministic and when you don't have access to the model you are comparing to from months ago? | |||||||||||||||||||||||
| ▲ | Retr0id 3 hours ago | parent [-] | ||||||||||||||||||||||
Something like this: https://marginlab.ai/trackers/claude-code/ (see methodology section) | |||||||||||||||||||||||
| |||||||||||||||||||||||