Remix clone Hacker News

new | show | ask | jobs Github

	▲	pvankessel 4 days ago
		I used MTurk heavily in its hey-day for data annotation - it was an invaluable tool for collecting training data for large-scale research projects, I honestly have to credit it with enabling most of my early career triumphs. We labeled and classified hundreds of thousands of tweets, Facebook posts, news articles, YouTube videos - you name it. Sure, there were bad actors who gave us fake data, but with the right qualifications and timing checks, and if you assigned multiple Turkers (3-5) to each task, you could get very reliable results with high inter-rater reliability that matched that of experts. Wisdom of the crowd, or the law of averages, I suppose. Paying a living wage also helped - the community always got extremely excited when our HITs dropped and was very engaged, I loved getting thank yous and insightful clarifying questions in our inbox. For most of this kind of work, I now use AI and get comparable results, but back in the day, MTurk was pure magic if you knew how to use it to its full potential. Truthfully I really miss it - hitting a button to launch 50k HITs and seeing the results slowly pour in overnight (and frantically spot-checking it to make sure you weren't setting $20k on fire) was about as much of a rush as you can get in the social science research world.