Remix.run Logo
yvdriess 3 days ago

One of the key innovations behind the DNN/CNN models was Mechanical Turk. OpenAI used a similar system extensively to improve the early GPT models. I would not be surprised that the practice continues today; NN models needs a lot of quality ground truth training data.

simonw 3 days ago | parent [-]

Right, but where are the details?

Given the number of labs that are competing these days on "open weights" and "transparency" I'd be very interested to read details of how some of them are handling the human side of their model training.

I'm puzzled at how little information I've been able to find.

esperent 3 days ago | parent | next [-]

I read this a few years ago.

Time Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic

https://time.com/6247678/openai-chatgpt-kenya-workers/

Beyond that, I think the reason you haven't heard more about it is that it happens in developing countries, so western media doesn't care much, and also because big AI companies work hard to distance themselves from it. They'll never be the ones directly employing these AI sweatshop works, it's all contracted out.

conradkay 3 days ago | parent | prev | next [-]

Good article from 2023, not much data though if that's what you're looking for:

https://nymag.com/intelligencer/article/ai-artificial-intell...

unwalled: https://archive.ph/Z6t35

Generally seems similar today just on a bigger Scale. And much more focus on coding

Here in the US DataAnnotation seems to be the most marketed company offering these jobs

ics 3 days ago | parent | prev [-]

This is not going to be as deep/specific as you want but a starting point from one of the companies that handles this sort of work is here: https://humandata.mercor.com/mercors-approach/black-box-vs-o...