▲ | modeless 6 days ago | |||||||||||||||||||||||||
There are different scores reported by Google for "diff" and "whole" modes, and the others were "diff" so I chose the "diff" score. Hard to make a real apples-to-apples comparison. | ||||||||||||||||||||||||||
▲ | jsnell 6 days ago | parent | next [-] | |||||||||||||||||||||||||
The 73% on the current leaderboard is using "diff", not "whole". (Well, diff-fenced, but the difference is just the location of the filename.) | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | tcdent 6 days ago | parent | prev [-] | |||||||||||||||||||||||||
They just pick the best performer out of the built-in modes they offer. Interesting data point about the models behavior, but even moreso it's a recommendation of which way to configure the model for optimal performance. I do consider this to be an apple-to-apples benchmark since they're evaluating real world performance. |