Skip to main content

Clifford Chance

Clifford Chance
IP Insights<br />

IP Insights

Training Generative AI: Tensions Emerging Between New Data Markets and Fair Use

Written with Dan Pelo, Legal Technology Advisor at Clifford Chance.

In a highly anticipated, first of its kind, decision on training generative AI using copyrighted works, the U.S. District Court for the Northern District of California has held in Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) (Anthropic) that some (but not all) categories of content used to train large language models (LLMs) could benefit from the "fair use" defense to copyright infringement.

Each application of the "fair use" principles is fact-specific, and it is notable that Anthropic trained its LLMs on digitized versions of books. The Anthropic decision does not address the use of content scraped from the web in AI model training.

Just days after the Anthropic decision, the same court issued a separate ruling on similar issues in Kadrey v. Meta. Unlike the Anthropic decision, The Kadrey decision was not a class action suit—limiting its impact to the named plaintiffs—and is beyond the scope of this article, and we note only that the court in Kadrey also upheld a "fair use" defense in relation to certain aspects of AI training, albeit for different reasons to the court in Anthropic. In U.S. copyright law, "fair use" is a doctrine that allows limited use of copyrighted works without the owner’s permission. Four factors determine whether a use is considered fair: (1) the purpose and character of the use (for example, noncommercial, or transformative uses are favored), (2) the nature of the original copyrighted work (using factual material is more likely fair than using highly creative works), (3) the amount and substantiality of the portion used (using smaller, less significant excerpts weighs toward fair use), and (4) the effect of the use on the potential market or value of the original work (uses that do not harm the creator’s ability to profit are more likely to be fair). Often, (1) and (4) are essential factors in the analysis.

The Anthropic decision serves as a bellwether in the industry and traces emerging issues and tensions with potential data markets. This is showcased further when comparing factors (1) and (4) of the decision with the recent report by the Copyright Office on Generative AI training (AI Report). 

An apparent slowly materializing trend is that generative AI may, on the one hand, dilute the market of copyright owners and, on the other hand, potentially create new licensing opportunities for wholesale consumption of copyrighted works for training generative AI. Because other Courts, assuming a similar set of facts, are likely to rule, as did the Court in Anthropic, that such markets are outside of the intended markets of the Copyright Act of 1976, 17 U.S.C., Sec. 101, et seq. (Copyright Act), it is possible that policymakers will intervene to safeguard the U.S.'s competitive edge in the global market.

 Anthropic has made clear that the use of pirated works for training generative AI systems may be impermissible under fair use. Companies that are deploying such AI systems have a clearer pathway to establish that use of copyrighted works as training data for AI is "fair" by including additional software between the user and the AI systems to ensure that no copyrighted work reaches the user.

To help illustrate how the recent Anthropic decision aligns with - and diverges - from the U.S. Copyright Office’s recent report on generative AI, we’ve prepared a side-by-side comparison of two key fair use factors: Purpose and Character of the Use (Factor 1) and Effect on the Market (Factor 4). Please see charts to the right and click on "Enlarge Image" for an expanded view of the text. This summary highlights areas of agreement (in green), neutrality (in yellow), and tension (in red) between the Court’s reasoning and the Copyright Office’s policy perspective. 

For a more detailed version of this comparison, including additional context and commentary, please contact our team to request a copy.

  • Share on Twitter
  • Share on LinkedIn
  • Share via email
Back to top