Reddit brands Anthropic as 'anything but' a white knight, heating up AI scraping wars
A clash between established online content providers and artificial intelligence upstarts is heating up again as AI-driven large language models gobble information in a race to dominate the web’s frontier.
The latest of the AI scraping wars is between Reddit (RDDT) and AI startup Anthropic (ANTH.PVT), a company backed by tech giants Amazon (AMZN) and Google (GOOG, GOOGL) that created the AI language model Claude.
Reddit is claiming in a new lawsuit that Anthropic intentionally scraped Reddit users' personal data without their consent and then put their data to work training Claude.
Reddit said in its complaint that Anthropic "bills itself as the white knight of the AI industry" and argues that "it is anything but."
Anthropic said last year that it had blocked its bots from Reddit’s website, according to the complaint. But Reddit said Anthropic “continued to hit Reddit’s servers over one hundred thousand times.”
An Anthropic spokesperson said, "We disagree with Reddit's claims and will defend ourselves vigorously."
Anthropic is also defending itself against a separate suit from music publishers, including Universal Music Group (0VD.F), ABKCO, and Concord, alleging that Anthropic infringed on copyrights for Beyoncé, the Rolling Stones, and other artists as it trained Claude on lyrics to more than 500 songs.
The confrontation between Reddit and Anthropic adds to a growing number of high-profile cases where copyright holders have tried to guard their works from the reach of technology firms.
A question at the heart of all these lawsuits: Can artificial intelligence companies use copyrighted material to train generative AI models without asking the owner of that data for permission?
Courts haven't settled on the answer. However, last February, the US District Court for Delaware handed copyright holder Thompson Reuters a win in a case that could impact what data training models can legally collect.
The court granted Thompson Reuters' request for summary judgment, saying that its competitor, Ross, infringed on its copyrights by using lawsuit summaries to train its AI model.
The court rejected Ross's argument that it could use the summaries under the concept of fair use, which allows copyrights to be used for news reporting, teaching, research, criticism, and commentary.
One big name featuring prominently in some of these clashes is OpenAI (OPAI.PVT), the creator of chatbot ChatGPT that is run by Sam Altman and backed by Microsoft (MSFT).
Comedian Sarah Silverman has accused the companies in a lawsuit of copying material from her book and 7 million pirated works in order to train its AI systems. Parenting website Mumsnet has also accused OpenAI of scraping its six billion-word database without consent.