Meta Wins AI Training Copyright Lawsuit

Published 20 hours ago• 3 minute read

In a significant legal development for the artificial intelligence industry, a U.S. federal judge in San Francisco, Vince Chhabria, granted Meta a major victory in a copyright lawsuit. The case was brought by a group of 13 authors, including Sarah Silverman and Junot Díaz, who alleged that Meta illegally trained its Llama AI models on their copyrighted works without permission. On Wednesday, Judge Chhabria ruled in Meta’s favor, concluding that the company's use of the works for AI training constituted "fair use" under copyright law, deeming it "transformative" enough. This decision marks the second such courtroom triumph for an AI firm in a single week, following a similar fair use ruling for Anthropic.

The plaintiffs had put forth several arguments concerning fair use, two of which Judge Chhabria explicitly deemed "clear losers." These included the claim that Meta's Llama AI could reproduce significant snippets of text from their books and that using their works without permission diluted their ability to license them for AI training data. The judge dismissed these, stating that Llama is not capable of generating enough text from the plaintiffs’ books to be substantial, and that the authors are not entitled to a market for licensing their works specifically as AI training data.

However, Judge Chhabria’s ruling came with crucial caveats and highlighted potential weaknesses in the broader AI ecosystem's defense. He stressed that his decision “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful,” but rather that “these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.” The judge was more receptive to a "potentially winning argument" that the authors failed to sufficiently develop: the risk that generative AI, by training on copyrighted material, could flood the market with similar works, thereby causing significant market dilution and harming the incentive for human creation. He emphasized that copyright law primarily aims to preserve the incentive for human beings to create artistic and scientific works, and fair use typically doesn't apply to copying that diminishes creators' income.

Despite this openness to the market harm argument, the court found that the plaintiffs had not provided sufficient evidence. Meta, conversely, introduced evidence suggesting its copying had not caused market harm, while the plaintiffs' assertions were dismissed as mere "speculation" due to a lack of empirical evidence. This distinction underscores the importance of concrete proof in intellectual property disputes.

The lawsuit detailed allegations that Meta utilized "shadow libraries" and BitTorrent to download millions of pirated books, including the plaintiffs' works, to train its Llama model, which involved a dataset of nearly 200,000 books. While the judge ruled on the AI training claim, the issue of illicitly obtaining and reuploading these libraries via torrenting was not part of this specific ruling and remains unresolved.

In response to the decision, a Meta spokesperson expressed appreciation, reiterating that "open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." The judge clarified that the consequences of this ruling are limited, being confined to the specific facts presented, suggesting that outcomes in other similar lawsuits could vary. The plaintiffs’ legal representatives, Boies Schiller Flexner LLP, stated their disagreement with the outcome, highlighting the court’s acknowledgement that feeding copyrighted works into AI models without permission generally violates the law, and are currently considering an appeal.

From Zeal News Studio(Terms and Conditions)