Exposing the Rotten Reality of AI Training Data

In a report released December 20, 2023, the Stanford Internet Observatory said it had detected more than 1,000 instances of verified child sexual abuse imagery in a significant dataset utilized for training generative AI systems such as Stable Diffusion 1.5. This troubling discovery builds on prior research into the “dubious curation” of large-scale datasets used to train AI systems, and raises concerns that such content may contributed to the capability of AI image generators in producing realistic counterfeit images of child sexual exploitation, in addition to other harmful and biased material. Justin Hendrix spoke the report’s author, Stanford Internet Observatory Chief Technologist David Thiel.

Om Podcasten

Tech Policy Press is a nonprofit media and community venture intended to provoke new ideas, debate and discussion at the intersection of technology and democracy. The Sunday Show is its podcast. You can find us at https://techpolicy.press/, where you can join the newsletter.