Three authors have filed a class action lawsuit against Anthropic, claiming the company used their works without permission to train its AI model, Claude. The lawsuit, filed in a federal court in California on Monday (August 19), accuses Anthropic of using pirated versions of books by hundreds of authors to develop its AI.
Data on “The Pile”
Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson allege that their works were included in a large dataset called “The Pile,” which Anthropic used to train its AI model, Claude. This dataset includes “Books3,” a collection of pirated ebooks featuring works by several well-known authors. The lawsuit accuses Anthropic of intentionally using these copyrighted materials without permission.
While Anthropic has acknowledged using The Pile to train Claude, the company has not commented directly on the lawsuit. They have stated that they know the legal action and are reviewing the complaint.
The authors seek a court ruling to recognize the copyright infringement and prevent Anthropic from using their works without authorization. They also request financial compensation, although the exact amount has not been specified.
Legal issues regarding data usage
The lawsuit adds to the ongoing debate between authors, digital content creators, and tech companies that use these materials to train their AI models.
Tech companies often employ internet scraping techniques to gather vast amounts of data, which is crucial for developing their AI systems. While this process is essential for advancing AI, it raises significant legal and ethical issues, particularly regarding copyright and privacy.
Other authors and copyright holders have also taken legal action against tech companies for scraping their content from the web. Major players like OpenAI and Meta face similar accusations of unlawfully using copyrighted material in their AI models.