▲ | golergka 16 hours ago | |||||||||||||
What are is this problem from? What areas in general did you find useful to create such benchmarks? May be instead of sharing (and leaking) these prompts, we can share methods to create one. | ||||||||||||||
▲ | mobilejdral 13 hours ago | parent | next [-] | |||||||||||||
Think questions where there is a ton of existing medical research, but no clear answer yet. There are a dozen alzheimer's questions you could for example ask which would require it to pull in a half dozen contradictory sources into a plausible hypothesis. If you have studied alzheimer's extensively it is trivial to evaluate the responses. One question around alzheimer's is one of my goto questions. I am testing its ability to reason. | ||||||||||||||
▲ | henryway 16 hours ago | parent | prev [-] | |||||||||||||
Can God create something so heavy that he can’t lift it? | ||||||||||||||
|