| ▲ | stephantul 7 hours ago | |||||||
1) yes! It’s not accuracy, but ndcg 2) we assume that if the agent gets the correct answer in the returned snippets it does not need to read further | ||||||||
| ▲ | esafranchik 7 hours ago | parent [-] | |||||||
Wouldn't NDCG/token results vary wildly depending on the agent's query and the number of returned items? e.g. agents often run `grep -m 5 "QUERY"` with different queries, instead of one big grep for all items. | ||||||||
| ||||||||