Google's AI Training Controversy: Publishers' Opt-Out Requests Ignored, DOJ Takes Action in Antitrust Case
May 3, 2025
A document from August 2024 indicated that Google filtered 80 billion out of 160 billion tokens from its AI training data, with the removed tokens representing content from publishers who opted out.
The DOJ is proposing significant changes to Google's business model, including potential divestitures and restrictions on paying for default search placements, which could impact Google's AI initiatives.
A decision regarding the antitrust remedies is anticipated later in 2025, with potential implications for publisher control over data used in AI training.
During a recent court hearing, Eli Collins, Vice President at Google DeepMind, revealed that Google is utilizing publisher content to train its AI for search, despite explicit opt-out requests from website owners.
Publishers can only opt out of data usage for search AI if they also prevent their content from being indexed through the robots.txt standard, which governs web crawling.
This situation raises significant questions about the complex relationship between tech firms, content creators, and the data used to train AI, particularly regarding data rights and equitable compensation for publishers.
The outcome of the ongoing antitrust lawsuit against Google could set crucial precedents for how technology companies manage content permissions in the AI era and may lead to new regulations enhancing publishers' control over their intellectual property.
The ongoing legal challenges faced by AI companies, including Google, highlight the tensions surrounding content use and copyright infringement in AI training.
Overall, this case underscores the urgent need for clarity and fairness in the relationship between tech companies and content creators as AI technology continues to evolve.
Publishers are increasingly advocating for greater transparency and control over their content in the context of AI usage, as the balance of power appears to favor tech platforms amid rising AI-generated content.
Collins' testimony highlighted a gap between publishers' intentions with their opt-out requests and Google's actual data practices, raising concerns about the effectiveness of control mechanisms for content owners.
The Department of Justice is leveraging these revelations in its antitrust case against Google, emphasizing internal documents that demonstrate the extent of data removed from DeepMind's training pool.
Summary based on 20 sources
Get a daily email with more Tech stories
Sources

Yahoo Finance • May 3, 2025
Google Can Train Search AI With Web Content After AI Opt-Out
Economic Times • May 4, 2025
Google can train search AI with web content after AI opt-out
Business Standard • May 4, 2025
Google can train search AI on web content even if publishers opt out
The Business Times
Google can train search AI with web content even after opt-out