Remix.run Logo
golergka 16 hours ago

What are is this problem from? What areas in general did you find useful to create such benchmarks?

May be instead of sharing (and leaking) these prompts, we can share methods to create one.

mobilejdral 13 hours ago | parent | next [-]

Think questions where there is a ton of existing medical research, but no clear answer yet. There are a dozen alzheimer's questions you could for example ask which would require it to pull in a half dozen contradictory sources into a plausible hypothesis. If you have studied alzheimer's extensively it is trivial to evaluate the responses. One question around alzheimer's is one of my goto questions. I am testing its ability to reason.

henryway 16 hours ago | parent | prev [-]

Can God create something so heavy that he can’t lift it?

viraptor 14 hours ago | parent | next [-]

There's so much text on this already, it's unlikely to be even engaging any reasoning. Or specifically, if you got a few existing answers from philosophy mashed together, you wouldn't be able to tell it apart from reasoning anyway.

abc-1 16 hours ago | parent | prev [-]

https://chatgpt.com/share/680ae04a-e360-8004-88fc-8426e8e700...