| ▲ | tweakimp 6 hours ago | |||||||
Every time I see a table like this numbers go up. Can someone explain what this actually means? Is there just an improvement that some tests are solved in a better way or is this a breakthrough and this model can do something that all others can not? | ||||||||
| ▲ | rvnx 6 hours ago | parent | next [-] | |||||||
This is a list of questions and answers that was created by different people. The questions AND the answers are public. If the LLM manages through reasoning OR memory to repeat back the answer then they win. The scores represent the % of correct answers they recalled. | ||||||||
| ||||||||
| ▲ | stavros 5 hours ago | parent | prev [-] | |||||||
I estimate another 7 months before models start getting 115% on Humanity's Last Exam. | ||||||||