| ▲ | dfajgljsldkjag 13 hours ago | ||||||||||||||||
I am always skeptical of benchmarks that show perfect scores, especially when they come from the company selling the product. It feels like everyone claims to have solved conversational timing these days. I guess we will see if it is actually any good. | |||||||||||||||||
| ▲ | bpanahij 2 hours ago | parent | next [-] | ||||||||||||||||
You should be skeptical, and try it out. I selected 28 long conversations for our evaluation set, all unseen audio. Every turn taking model makes tradeoffs, and I tried to make the best tradeoffs for each model by adjusting and tuning the implementations. I’m certainly not in a position as the creator of Sparrow to be totally objective. However we did use unaltered real conversational audio to evaluate. I tried to find examples that would challenge Sparrow-1 with lots of variation in speaker style across the conversations. | |||||||||||||||||
| ▲ | fudged71 13 hours ago | parent | prev [-] | ||||||||||||||||
Different industry, but our marketing guy once said "You know what this [perfect] metric means? We can never use it in marketing because it's not believable" | |||||||||||||||||
| |||||||||||||||||