| ▲ | devanshp 18 hours ago | |
Cool post! I'm somewhat curious whether the data quality scoring has actually translated into better data; do you have numbers on how much more of your data is useful for training vs in May? | ||
| ▲ | rio-popper 18 hours ago | parent [-] | |
so the neural quality real-time checking was the most important thing here. Before we rewrote the backend, between 58-64% of participant hours were actually usable data. Now, it's between 90-95% If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores) | ||