OpenAI and Google trained their ai models on text transcribed from YouTube videos, which could violate creators' copyrights, according to technology/tech-giants-harvest-data-artificial-intelligence.html” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:The New York Times;elm:context_link;elmt:doNotAffiliate;cpos:1;pos:1;itc:0;sec:content-canvas”>. The report, which outlines the lengths OpenAI, Google and Meta have gone to maximize the amount of data they can feed their AIs, cites numerous people with knowledge of the companies' practices. It comes just days after YouTube CEO Neal Mohan said in an interview with than OpenAI's alleged use of YouTube videos to train its new text-to-video generator, Sora,.
According to the NOWOpenAI used its speech recognition tool Whisper to transcribe more than one million hours of YouTube videos, which were then used to train GPT-4. ai?rc=whf0fd” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:The Information;elm:context_link;elmt:doNotAffiliate;cpos:4;pos:1;itc:0;sec:content-canvas”> It previously reported that OpenAI had used YouTube videos and podcasts to train the two ai systems. OpenAI President Greg Brockman was reportedly among those on this team. Under Google's rules, “unauthorized scraping or downloading of YouTube content is not permitted,” said Matt Bryant, a Google spokesman. NOWand also said that the company was not aware of such use by OpenAI.
The report, however, claims that there were people at Google who knew but did not take action against OpenAI because Google was using YouTube videos to train its own ai models. Google said NOW It only does so with videos from creators who have agreed to participate in an experimental program. Engadget has contacted Google and OpenAI for comment.
He NOW The report also claims that Google amended its privacy policy in June 2022 to more broadly cover its use of publicly available content, including Google Docs and Google Sheets, to train its ai models and products. Bryant said NOW that this is only done with the permission of users who opt-in to Google's experimental features, and that the company “did not begin training on additional types of data based on this language change.”