Remix clone Hacker News

new | show | ask | jobs Github

	▲	devanshp 18 hours ago
		Cool post! I'm somewhat curious whether the data quality scoring has actually translated into better data; do you have numbers on how much more of your data is useful for training vs in May?
	▲	rio-popper 18 hours ago \| parent [-]
		so the neural quality real-time checking was the most important thing here. Before we rewrote the backend, between 58-64% of participant hours were actually usable data. Now, it's between 90-95% If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores)