Most AI tools require a huge amount of hidden labor to make them work at all. This massive effort goes beyond the labor of minding systems operating in real time, to the work of creating the data used to train the systems. These kinds of workers do a host of tasks. They are asked to draw green highlighting boxes around objects in images coming from the camera feeds of self-driving cars; rate how incoherent, helpful, or offensive the existing responses from language models are; label whether social media posts include hate speech or violent threats; and determine whether people in sexually provocative videos are minors. These workers handle a great deal of toxic content. Given that media synthesis machines recombine internet content into plausible-sounding text and legible images, companies require a screening process to prevent their users from seeing the worst of what the web has to offer.

This industry has been called by many names: “crowdwork,” “data labor,” or “ghost work” (as the labor often goes unattended and unseen by consumers in the West). But this work is very visible for those who perform it. Jobs in which low-paid workers filter out, correct, or label text, images, videos, and sounds have been around for nearly as long as AI and the current era of deep learning methods has been. It’s not an exaggeration to say that we wouldn’t have the current wave of “AI” if it weren’t for the availability of on-demand laborers.