OpenAI's legal battle with The New York Times over data to train its ai models could still be brewing. But OpenAI is moving forward with deals with other publishers, including some of the largest news publishers in France and Spain.
OpenAI on Wednesday Announced which signed contracts with Le Monde and Prisa Media to bring French and Spanish news content to OpenAI's ChatGPT chatbot. In a blog post, OpenAI said the partnership will put organizations' current events coverage (from brands like El País, Cinco Días, As and El Huffpost) in front of ChatGPT users when it makes sense, in addition to contributing to progress from OpenAI. -Expanding volume of training data.
OpenAI writes:
Over the coming months, ChatGPT users will be able to engage with relevant news content from these publishers through curated summaries with attribution and enhanced links to the original articles, giving users the ability to access additional information or related articles from their news sites… We are continually improving ChatGPT and supporting the news industry's essential role in delivering authoritative information in real time to users.
So, OpenAI's revealed licensing deals with a handful of content providers right now. Now it seemed like a good opportunity to take stock:
- Shutterstock stock media library (for images, videos, and music training data)
- The Associated Press
- Axel Springer (owner of Politico and Business Insider, among others)
- The world
- Medium Rush
How much does OpenAI pay each? Well, he doesn't say it, at least not publicly. But we can estimate.
Information reported In January, OpenAI offered publishers between $1 million and $5 million a year to access files to train their GenAI models. That doesn't tell us much about the partnership with Shutterstock. But on the article licensing front, assuming The Information's reporting is accurate and those numbers haven't changed since then, OpenAI is shelling out between $4 million and $20 million a year on news.
That could be pennies for OpenAI, whose war chest totals more than $11 billion and whose annualized revenue recently surpassed $2 billion (tech–ai-chatgpt-1851247985#:~:text=OpenAI%20joins%20Google%20and%20Meta,a%20decade%20of%20being%20founded&text=Just%20seven%20years%20after%20being,knowledge%20told%20the%20Financial%20Times.” target=”_blank” rel=”noopener”>by financial times). But as Homebrew partner and Screendoor co-founder Hunter Walk recently reflected, it's substantial enough to potentially outperform ai rivals also seeking licensing deals.
Walk ai-will-be-ruled-by-incumbents/” target=”_blank” rel=”noopener”>writes on his blog:
(I)f experimentation is limited by nine-figure licensing agreements, we are doing innovation a disservice… The controls placed on the 'owners' of training data are creating a huge barrier entrance for rivals. If Google, OpenAI, and other big tech companies can set the cost high enough, they implicitly prevent future competition.
Now, today it is debatable whether there is a barrier to entry. Many (if not most) ai vendors have chosen to risk the wrath of intellectual property holders by choosing not to license the data with which they are training ai models. There is evidence that the art-generating platform Midjourney, for example, is training in stills from Disney movies, and Midjourney does not have any agreement with Disney.
The most difficult question to resolve is: should licenses simply be the cost of doing business and experimenting in the ai space?
Walking would say no. He advocates for a “safe harbor” imposed by regulators that would protect any ai providers (as well as startups and small-time researchers) from legal liability, as long as they meet certain ethical and transparency standards.
Interestingly, the United Kingdom recently technology/2024/feb/02/uk-ministers-urged-to-protect-creatives-whose-work-is-used-by-ai-firms” target=”_blank” rel=”noopener”>tried codify something along those lines, exempting the use of text and data mining for ai training from copyright considerations as long as it is for research purposes. But those efforts ended up failing.
I, for one, am not sure I would go as far as Walk in his “safe harbor” proposal, considering the impact ai threatens to have on an already destabilized news industry. A recent model from The Atlantic tech/ai/news-publishers-see-googles-ai-search-tool-as-a-traffic-destroying-nightmare-52154074″ target=”_blank” rel=”noopener” data-mrf-link=”https://www.wsj.com/tech/ai/news-publishers-see-googles-ai-search-tool-as-a-traffic-destroying-nightmare-52154074″>found that if a search engine like Google integrated ai into search, it would respond to a user's query 75% of the time without needing to click on your website.
But maybe there is is space for carve-outs.
Editors should be paid, and fairly. However, isn't there an outcome where they get paid and ai incumbents' rivals, as well as academics, have access to the same data? Like those Headlines? She should think about it. Subsidies are one-way. Larger venture capital checks are another.
I can't say I have the solution, especially considering that courts have yet to decide whether (and to what extent) fair use protects ai providers from copyright claims. But it is vital that we discover these things. Otherwise, the industry could well end up in a situation where the academic “brain drain” continues unabated and only a few powerful companies have access to vast pools of valuable training suites.