OpenAI Under Fire for Illegally Scraping YouTube Data to Train GPT-4
April 7, 2024
OpenAI, under Sam Altman's leadership, utilized millions of hours of YouTube transcripts to train its GPT-4 model.
The company's president, Greg Brockman, played a direct role in gathering the YouTube video data.
OpenAI's actions, which it considered fair use, might have been illegal in regard to YouTube's content policies.
Google, which owns YouTube, prohibits unauthorized data scraping and has been made aware of OpenAI's potential infringement.
Reports from last year indicated OpenAI's use of YouTube data in AI training, an operation now supported by Microsoft.
OpenAI employs a mix of public and non-public data sources to stay competitive in global AI research.
The company is facing increased scrutiny over its data collection methods for AI development, particularly concerning YouTube content.
Summary based on 3 sources
Get a daily email with more Tech stories
Sources

Ground News • Apr 7, 2024
OpenAI and Googles Data Use Strategies Spark Debate in AI Development
Telangana Today • Apr 7, 2024
OpenAI trained AI model with million hours of YouTube videos: Report