Fair Use for Fairer AI

How Copyright Law Can Fix
Artificial Intelligence's Implicit Bias Problem


Using copyrighted works as training data for AI systems is not only a fair use, but one that can quite literally promote fairness.

The quandary of biased data producing biased results is not new—it's as old as the first computer. AI systems trained using vast amounts of data are used by our banks and bosses, our computers and criminal justice system, which is why it's crucial to understand why AI systems seem to reflect, amplify, and perpetuate human bias rather than eliminate it. Scholars have long examined the complex legal and ethical questions posed by collecting, storing, and processing the quantities of "Big Data" required to train AI. There is a robust body of scholarship, even entire conferences, dedicated to reducing bias and enhancing the fairness of AI. Absent from the conversation, however, are analyses from copyright scholars about how our legal framework inadvertently biases who can access and use certain works as training data.

This Article, still in progress, is the first to address how copyright law channels AI in a fundamentally biased direction by advantaging established companies and privileging biased data—and suggests that using copyrighted works as training data to mitigate bias is a fair use.

An early draft was presented at We Robot 2017, held at Yale Law School.