AI Copyright Litigation: Training Data, AI Outputs, and Fair Use

1. What Is Ai Copyright Litigation about?

Quick answer: AI copyright litigation concerns whether using copyrighted works to train generative AI, and the content those systems produce, infringes copyright. The biggest issues are whether AI training is fair use, whether the way training data was obtained (for example, from pirated sources) matters, and whether AI outputs reproduce protected works. Early federal rulings have split, and higher courts have not yet resolved the key questions.

AI copyright litigation refers to the growing body of lawsuits in which copyright owners, including authors, artists, publishers, news organizations, and music companies, claim that AI developers infringed their works. The claims generally fall into two categories. The first concerns inputs: copying copyrighted works to assemble training datasets and to train the model. The second concerns outputs: whether what the AI generates reproduces or closely imitates the protected works included in training data. These are analytically distinct, and a case can succeed or fail on one without the other. The pivotal legal question in most of these cases is whether the copying involved in training qualifies as fair use, a defense that turns on a fact-specific, four-factor analysis rather than a bright-line rule.

This sits at the intersection of copyright and AI, an extension of traditional copyright litigation into a new technology.

Issue	The Core Question	Why It Matters
Training inputs	Is copying works to train a model infringement?	Determines AI developers' core exposure
Data acquisition	Were works obtained lawfully or from piracy?	Can decide the case even if training is fair use
AI outputs	Does generated content reproduce protected works?	Shifts risk to businesses using the output
Fair use	Does the four-factor defense apply?	The pivotal issue in most cases
Market harm	Does the use harm the work's market?	Often the deciding fair use factor

What Is the Difference between Input and Output Claims?

Input and output claims target different stages of how AI works. Input claims focus on the training process: to build a model, developers copy large volumes of text, images, or other works into datasets, and copyright owners argue that this copying, done without permission, infringes their reproduction rights. Output claims focus on what the model produces: if a system generates content that reproduces or closely mimics a specific copyrighted work, that output may itself infringe.

The distinction has real consequences for who is exposed. Input claims primarily concern the AI developer that built and trained the model, while output claims can reach the businesses and individuals who use AI tools to generate and publish content. A company can face output liability for infringing material it publishes even where the underlying training was found lawful, which is why the two types of claims are best analyzed separately, much as in any copyright infringement lawsuit.

How Does Fair Use Apply to Ai?

Fair use is the central defense in AI training cases. It is a doctrine under Section 107 of the Copyright Act that permits limited use of copyrighted material without permission, and courts evaluate it by weighing four factors: the purpose and character of the use, including whether it is commercial and whether it is transformative; the nature of the copyrighted work; the amount used; and the effect on the market for the original. No single factor is decisive, and the analysis is fact-specific.

In the AI context, developers argue that training is highly transformative, since the model learns statistical patterns rather than republishing the works, while rights holders argue that the copying is wholesale and harms emerging licensing markets. Courts have weighed these factors differently, so how fair use applies to a particular system depends on the specific facts of training, acquisition, and output.

2. What Have the Courts and Regulators Decided so Far?

A handful of federal trial court decisions in 2025 provided the first substantive guidance on AI copyright, and they did not all point the same way. Some found AI training to be fair use, at least on the facts before them, while another rejected the defense where the AI product competed directly with the copyrighted source. A consistent theme is that how the training data was acquired, and whether the use harms the market for the original, can matter as much as the transformative nature of training itself. These are trial-level rulings, several are subject to appeal, and many more cases are pending.

Regulators have weighed in as well. The U.S. Copyright Office's 2025 AI reports did not create a new statutory rule, but they framed the central issues: a Part 2 report addressed the copyrightability of AI outputs and the continuing requirement of human authorship, while a Part 3 report addressed generative AI training and fair use, organizing the debate around fair use, licensing markets, and the factual differences between training, data acquisition, and outputs. As of 2026, there is still no dedicated AI training exception comparable to some text-and-data-mining regimes abroad.

What Did the Anthropic and Meta Rulings Hold

In two June 2025 decisions from the Northern District of California, federal judges found that training large language models on copyrighted books could be fair use, while differing in their reasoning. In Bartz v. Anthropic, the court held that using lawfully acquired books to train the model was highly transformative and qualified as fair use. It treated training use separately from the developer's acquisition and retention of pirated book copies, leaving the pirated-library conduct as a separate source of liability risk that later drove a reported settlement of about $1.5 billion.

In Kadrey v. Meta, the court granted the developer summary judgment on fair use, but on narrower grounds, resting heavily on the authors' failure to prove market harm and signaling that stronger evidence in a future case could change the outcome. Together the rulings suggest training may often be transformative, yet leave open how pirated sources and market-harm evidence will be treated, questions at the core of ongoing AI litigation.

When Have Courts Rejected Fair Use in Ai-Related Copyright Cases?

Not every court has accepted a fair use defense. In Thomson Reuters v. ROSS Intelligence, a Delaware federal court ruled in early 2025 against fair use where a company used copyrighted legal research material to build a competing research tool. The case centered less on generative AI training than on using protected content to create a product that substituted for the original: the court emphasized that the resulting tool competed directly with the copyright owner's own product, weighing heavily against fair use under the market-harm factor.

This decision illustrates that the outcome depends heavily on the facts, especially whether the AI product substitutes for or competes with the original work. Where a tool is built to displace the very market it drew from, a fair use defense is far weaker, a distinction that can determine the entire case and that rights holders increasingly emphasize alongside their copyright law arguments.

Does It Matter How the Training Data Was Obtained?

Yes, how training data was obtained has become one of the most important variables. Several decisions distinguished between works a developer acquired lawfully, such as by purchasing copies, and works downloaded from pirated or unauthorized sources. Even where training itself was treated as transformative, building and keeping a library of pirated works has been treated as separate conduct that can carry its own substantial liability.

The practical lesson is that lawful sourcing of training data can be decisive, independent of the fair use question. AI developers face heightened risk when datasets include pirated material, and documenting where training data came from and on what legal basis has become an important safeguard.

3. What This Means for Different Parties

AI copyright litigation affects three broad groups differently: the creators and rights holders whose works are at stake, the AI developers who build and train models, and the businesses that use AI to generate content. Each faces distinct risks and has distinct options as the law develops. Understanding which group a given concern falls into helps clarify what the realistic exposure and next steps are.

Prudent steps look similar across groups: documenting sources and uses, monitoring the developing case law, and getting tailored advice rather than relying on early headlines. The difference lies in what each party is trying to protect.

What Should Creators and Rights Holders Know?

Creators and rights holders, including authors, artists, photographers, news organizations, and music companies, are the plaintiffs driving most of these cases. Their central challenge is showing not just that their works were copied, but that the use was not fair, often by demonstrating market harm such as lost licensing revenue or direct competition from AI output. For output claims, a rights holder generally must show substantial similarity between the AI-generated content and the protected work, the same standard that governs traditional infringement.

Rights holders are also responding through licensing and contract terms, increasingly restricting AI training with clauses sometimes called "no train" provisions. This matters because of a separate risk for AI developers: even where copying might qualify as fair use under copyright law, scraping a site in violation of its terms of service can give rise to a breach of contract claim, an independent theory that does not depend on winning the copyright argument. For those weighing a claim, preserving evidence of how works were used, what market harm resulted, and what contractual terms applied is valuable, and licensing strategy can complement litigation, as it does across copyright licensing generally.

What Should Businesses Using Ai Know?

Businesses that use AI to create content face a risk many have not fully considered: even where AI training is found to be fair use, the user can be liable for infringing outputs they publish. A marketing team, consultancy, or publisher that uses AI to generate material bears the same copyright responsibility for that output as it would for human-created work, so AI-generated content that is substantially similar to a protected work can expose the business to a claim.

The defensive steps are practical. Maintain records of how AI tools are used, adopt usage policies, review AI-generated material before publishing it, and prefer tools and vendors that offer clear terms and, where available, indemnification. These measures both reduce the chance of infringing outputs and demonstrate good faith, strengthening the company's position if a dispute arises.

4. When Ai Copyright Issues Need a Lawyer

AI copyright litigation is complex, fast-moving, and fact-specific, so guidance tailored to a particular situation is far more reliable than general conclusions drawn from headlines. Whether a claim or defense is strong depends on how training data was acquired, whether outputs are substantially similar to protected works, what market harm exists, what contractual terms applied, and how the developing case law maps onto the specific facts.

Legal support is especially valuable for a rights holder evaluating whether works were used and whether a claim is viable, for an AI developer assessing training-data and output risk or facing a suit, and for a business wanting to use AI safely through usage policies, vendor and indemnification review, and output safeguards. A lawyer can assess exposure under the current case law, preserve and develop the evidence that matters most, such as data sourcing and market harm, and represent a party in litigation or settlement. Since the stakes can be high, getting advice early, and revisiting it as new decisions come down, is the prudent course on any side of these disputes.

5. Frequently Asked Questions about Ai Copyright Litigation

These questions come from creators, AI developers, and businesses trying to understand AI copyright disputes. The law here is developing quickly, and outcomes depend on specific facts and current case law.

Is It Legal to Train Ai on Copyrighted Works?

It depends on the facts. Several 2025 federal trial court decisions found that training AI on copyrighted works can be fair use, particularly where the use is transformative and the works were acquired lawfully, while another rejected the defense where the AI product competed directly with the original. How the data was obtained also matters, since using pirated copies has been treated as separate, potentially infringing conduct. These are trial-level decisions subject to appeal, and there is no dedicated AI training exception, so there is no settled national answer yet.

What Is the Difference between Input and Output Infringement?

Input infringement concerns the copying done to train a model, namely reproducing works into a training dataset, and primarily exposes the AI developer. Output infringement concerns what the AI generates, specifically whether the content is substantially similar to a protected work, and can expose the business or person who publishes that output. The two are analyzed separately, so a case can find training lawful while still leaving users liable for infringing outputs.

Can I Sue If an Ai Was Trained on My Work?

Possibly, but a viable claim usually requires more than showing your work was included in a dataset. Most cases turn on whether the use was fair, where evidence of market harm is often decisive, and on how the data was acquired. If an AI product substitutes for or competes with your work, or if your work was taken from a pirated source, the position is stronger. There may also be a separate breach of contract claim if the work was taken in violation of a site's terms of service. Because outcomes are fact-specific, an evaluation of your particular facts against the current case law is the realistic starting point.

Can My Business Be Liable for Using Ai-Generated Content?

Yes. Even where AI training is found to be fair use, a business can be liable for publishing AI-generated content that is substantially similar to a copyrighted work, since the responsibility for AI output is generally the same as for human-created content. Practical safeguards include reviewing AI-generated material before publication, adopting usage policies, and favoring vendors that provide clear terms and indemnification.

What Is Fair Use in the Ai Context?

Fair use is a defense under Section 107 of the Copyright Act that allows limited use of copyrighted material without permission, judged by four factors: the purpose and character of the use, including whether it is transformative; the nature of the work; the amount used; and the effect on the work's market. In AI cases, developers argue training is transformative because the model learns patterns rather than republishing works, while rights holders emphasize wholesale copying and harm to licensing markets. Courts have applied these factors differently, so fair use in AI is not guaranteed and depends on the specific facts.

Is Ai-Generated Work Protected by Copyright?

This is a separate question from infringement, and the answer depends on human authorship. Copyright protection has traditionally required human authorship, and the Copyright Office's 2025 guidance reaffirmed that purely machine-generated output without sufficient human creative input faces obstacles to protection. Where a human meaningfully shapes and arranges AI-assisted output, some protection may be available for the human contribution, but the scope is still being worked out, so its copyright status should be treated as uncertain.

22 Jun, 2026

Ai Copyright Litigation: Training Data, Ai Outputs, and Fair Use

Contents