Font size:
Print
Fair Use and Generative AI
How the fair use clause is being applied to generative AI
Context: The use of copyrighted materials to train large language models (LLMs) has become a major legal and ethical issue in the AI industry. With lawsuits mounting across jurisdictions, two recent summary judgments in the U.S. offer critical insight into how courts may interpret the ‘fair use’ doctrine under copyright law in the context of AI training.
What Is Generative AI?
- Generative AI (GenAI) is a branch of artificial intelligence that focuses on creating new content—like text, images, audio, video, and even code—based on patterns learned from existing data.
- It uses deep learning models—especially transformers and neural networks—to learn patterns and relationships in data, generate new outputs in response to user prompts, and improve performance over time through feedback and fine-tuning.
- It enhances creativity, saves time, personalises content, and democratises content creation. However, it also raises concerns over bias, misinformation, ethical misuse (e.g., deepfakes), copyright issues, and high computational costs.
What is the “fair use” doctrine under U.S. copyright law, and why is it significant in the context of generative AI?
- The fair use doctrine allows limited use of copyrighted material without the creator’s permission, based on four key factors:
- Purpose and character of the use, including whether the use is “transformative.”
- Nature of the copyrighted work – with greater fair use leeway for factual works than for fiction or fantasy.
- Amount and substantiality of the portion used – analysed both qualitatively and quantitatively.
- Effect on the potential market or value of the copyrighted work, especially whether the use harms the economic prospects of the original.
- In the context of generative AI, this doctrine is being tested as AI models ingest massive volumes of copyrighted material during training, raising the question of whether such use qualifies as transformative or as infringement.
What are some recent U.S. court cases involving generative AI and fair use? What did they conclude?
Case 1: Andrea Bartz et al. v. Anthropic PBC
- Issue: Anthropic trained its LLMs (like Claude) using books obtained from multiple sources including digitised purchased books and books from potentially illegal sources.
- Court’s Decision: Print-to-digital conversion of legally purchased books was found to be fair use due to its transformative nature. Downloading/storing copies from illegal sources was not granted fair use—this issue will continue to be litigated.
Case 2: Richard Kadrey et al. v. Meta Platforms, Inc.
- Issue: Meta allegedly used books from illegal sources to train Llama.
- Court’s Decision: The use was found to be highly transformative. Plaintiffs failed to show evidence of market harm, so the court granted summary judgment in Meta’s favour. However, the court will continue proceedings regarding claims about unlawful distribution of books via torrenting.
How are different countries responding to AI training and copyright issues?
- European Union and UK: Allow limited text and data mining exceptions under specific conditions.
- South Africa: Considering adopting U.S.-style fair use provisions to promote AI research.
- India: Section 52 of the Copyright Act provides fair dealing exceptions (e.g., for research or education) but lacks AI-specific clauses.
What are the broader implications?
- Fair Use Favours AI Training: Courts recognise the transformative nature of GenAI training. This strengthens the case for fair use in LLM training, especially with legally obtained materials.
- Illegal Sources Pose Legal Risks: Downloading from unauthorised sources could nullify fair use claims.
- Market Harm Evidence is Crucial: Plaintiffs must provide empirical proof of market dilution or economic damage.
- Unsettled Legal Landscape: Each case depends heavily on facts and evidence. Legal standards are still evolving; future judgments may differ significantly.