Google Faces Scrutiny Over AI Training with Publisher Content Despite Opt-Outs
May 3, 2025
Publishers are increasingly advocating for greater transparency and control over their content in the context of AI usage, as the balance of power appears to favor tech platforms.
During a recent court hearing, Eli Collins, Vice President at Google DeepMind, revealed that Google continues to use publisher content to train its AI for search, despite explicit opt-out requests from website owners.
Publishers can only opt out of data usage for search AI if they also choose to be excluded from being indexed for search through the robots.txt standard, which governs web crawling.
A document from August 2024 indicated that Google had filtered 80 billion out of 160 billion tokens from its AI training data, removing tokens from publishers who opted out.
The outcome of the antitrust lawsuit against Google could set important precedents for how tech companies manage content permissions in the AI era and may lead to new regulations that enhance publishers' control over their intellectual property.
The case highlights the complex relationship between tech firms, content creators, and the data used to train AI, raising critical questions about data rights and fair compensation for publishers.
This situation has sparked ongoing debates about copyright and the use of online content by technology companies.
Collins' testimony emphasizes the unresolved issue of publisher control over data input, with a decision on antitrust remedies expected from Judge Amit Mehta later this year.
This testimony underscores a gap between publishers' intentions with their opt-out requests and Google's actual data practices, raising concerns about the control mechanisms available to content owners.
The Department of Justice is leveraging these revelations in its antitrust case against Google, highlighting internal documents that reveal the extent of data removed from DeepMind's training pool.
Collins indicated that while other AI companies compete to provide accurate results, Google believes its search data could significantly enhance its AI models, which raises concerns about competitive fairness.
The DOJ is seeking significant changes to Google's business practices, including restrictions on its AI products, which they argue are connected to Google's search monopoly.
Summary based on 15 sources
Get a daily email with more Tech stories
Sources

Yahoo Finance • May 3, 2025
Google Can Train Search AI With Web Content After AI Opt-Out
Economic Times • May 4, 2025
Google can train search AI with web content after AI opt-out
Business Standard • May 4, 2025
Google can train search AI on web content even if publishers opt out
The Business Times
Google can train search AI with web content even after opt-out