The race to lead ai has become a desperate search for the digital data needed to advance the technology. To get that data, tech companies like OpenAI, Google and Meta have taken shortcuts, ignored corporate policies and debated how to bend the law, according to a New York Times analysis.
At Meta, owner of facebook and instagram, managers, lawyers and engineers discussed buying the publisher Simon & Schuster last year to obtain long-form works, according to recordings of internal meetings obtained by The Times. They also agreed to collect copyrighted data from the Internet, even if it meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.
Like OpenAI, Google transcribed YouTube videos to collect text for its ai models, five people with knowledge of the company's practices said. That potentially violated the copyrights of the videos, which belong to their creators.
Last year, Google also expanded its terms of service. One motivation for the change, according to members of the company's privacy team and an internal message seen by The Times, was to allow Google to access Google Docs, restaurant reviews on Google Maps and other publicly available online material for more. information. ai products.
The companies' actions illustrate how online information (news, fiction, forum posts, Wikipedia articles, computer programs, photographs, podcasts, and movie clips) has increasingly become the lifeblood of the burgeoning artificial intelligence industry. Creating innovative systems depends on having enough data to teach technologies to instantly produce text, images, sounds and videos that resemble what a human creates.