Microsoft Faces Lawsuit By Authors Over Alleged Use Of Pirated Books To Train AI Model

Published 11 hours ago• 2 minute read

A group of authors has filed a lawsuit against Microsoft, accusing the tech giant of unlawfully using their copyrighted books to train its Megatron artificial intelligence model.

The complaint, filed Tuesday in a New York federal court, alleges that Microsoft relied on a dataset containing nearly 200,000 pirated digital books to build and train the AI system, without the authors’ consent.

Among the plaintiffs are Kai Bird, Jia Tolentino, and Daniel Okrent. They claim Microsoft’s AI model mimics the “syntax, voice, and themes” of their original works, effectively creating derivative content from stolen intellectual property.

The lawsuit is the latest in a growing wave of legal challenges by authors, publishers, and copyright holders against major tech companies including Meta, Anthropic, and OpenAI—many of which are accused of exploiting creative works to develop generative AI tools without permission or compensation.

According to the complaint, Microsoft’s Megatron model was trained using pirated texts to produce human-like responses to prompts. The authors argue that this training process not only infringes on their copyrights but also undermines the value of their original work. They are seeking a court injunction to stop Microsoft from further use of their material and statutory damages of up to $150,000 per infringed work.

The lawsuit comes just one day after a California federal judge issued the first major US ruling on AI and copyright, stating that while AI companies like Anthropic may be allowed to use copyrighted material under “fair use” doctrine, they could still be held liable if the works were obtained illegally.

Microsoft, which has not yet commented on the lawsuit, is a key investor in OpenAI and has been expanding its AI capabilities rapidly through products integrated into its Office and Azure platforms. Meanwhile, attorneys for the authors declined to comment on the ongoing case.

Tech firms have long argued that their use of copyrighted material for AI training qualifies as fair use, especially when the resulting models produce new and transformative content. But critics say such practices amount to systematic exploitation of creative labour, threatening the livelihoods of writers, artists, and journalists.

The outcome of this case could have major implications for how intellectual property is treated in the age of artificial intelligence, as courts begin to grapple with the legal boundaries of machine learning and content generation.

Melissa Enoch

Origin: