OpenAI and journalism

Our conversations with The New York Times appeared to be progressing constructively until our last communication on December 19. Negotiations centered on a high-value partnership around streaming viewing with attribution on ChatGPT, in which The New York Times would gain a new way to connect with its new and existing readers, and our users would gain access to their reports. We had explained to The New York Times that, like any single source, its content did not contribute significantly to the training of our existing models and would not have enough impact for future training. His December 27 lawsuit, which we learned about by reading The New York Times, was a surprise and disappointment to us.

Along the way, they mentioned seeing some regurgitation of their content, but repeatedly refused to share any examples, despite our commitment to investigating and fixing any issues. We have shown how seriously we treat this as a priority, as in July when removed a ChatGPT feature Immediately afterward we learned that it could stream content in unwanted ways.

Curiously, the regurgitation provoked by the New York Times seems to come from years-old articles that have proliferated in ai/cite/267/” rel=”noopener noreferrer” target=”_blank”>multiple third–party websites. It appears they intentionally manipulated the prompts, which often included long excerpts from articles, to make our model regurgitate. Even when we use such cues, our models often don't behave in the way The New York Times implies, suggesting that they either ordered the model to regurgitate or cherry-picked their examples from many tries.

Despite its claims, this misuse is not typical or permitted user activity, and is not a substitute for The New York Times. Regardless, we continually make our systems more resilient to adversarial attacks to regurgitate training data, and we've already made a lot of progress in our recent models.