Operai says that he is reviewing the evidence that the new Deepseek company broke its terms of service by reaping large amounts of data from its ai technologies.
The new company based in San Francisco, which is now valued at $ 157 billion, said Depseek may have used data generated by Operai Technologies to teach similar skills to its own systems.
This process, called distillation, is common throughout the ai field. But Openai's terms of service say that the company does not allow anyone to use the data generated by its systems to build technologies that compete in the same market.
“We know that RPC groups are actively working to use methods, including what is known as distillation, to replicate Advanced USAI models,” Openai spokeswoman Liz Bourgeois said in a statement sent by email to the New York Times, referring to the People's Republic of China.
“We are aware and reviewing the indications that Deepseek may have inappropriately distilled our models and share information as we know more,” he said. “We take aggressive and proactive countermeasures to protect our technology and continue working closely with the United States government to protect the most capable models that are being built here.”
Depseek did not immediately respond to a comment request.
Deepseek Spooked Silicon Valley tech Companies and sent the US financial markets to a tail tail earlier this week after launching artificial intelligence technologies that coincided with the performance of anything else in the market.
The predominant wisdom had been that the most powerful systems could not be built without billions of dollars in specialized computer chips, but Deepseek said he had created his technologies using much less resources.
Like any other ai company, Depseek built its technologies using computer code and cornered data through the Internet. IA companies are strongly leaning in a practice called Open Sourcing, freely share the code that supports their technologies and reuses the code shared by others. They see that this is as a way to accelerate technological development.
They also need large amounts of online data to train their ai systems. These systems learn their skills identifying patterns in text, computer programs, images, sounds and videos. The main systems learn their skills analyzing almost the entire text on the Internet.
Distillation is often used to train new systems. If a company takes patented technology data, practice can be legally problematic. But often allow open source technologies.
Operai now faces more than a dozen demands that accuse him of illegally using Internet data with copyright to train their systems. This includes a lawsuit filed by the New York Times against Operai and his Microsoft partner.
The demand argues that millions of articles published by The Times were used to train automated chatbots that now compete with the media as a reliable source of information. Both Operai and Microsoft deny the statements.
A Times report also showed that Openai has used voice recognition technology to transcribe the audio of YouTube videos, producing a new conversational text that would make an ai system more intelligent. Some Operai employees discussed how such movement could go against the YouTube rules, three people said with knowledge of the conversations.
An Operai team, including the company's president, Greg Brockman, transcribed more than one million hours of YouTube videos, people said. The texts fed on a system called GPT-4, which was widely considered one of the most powerful ai models in the world and was the basis of the latest version of the Chatgpt Chatbot.
(Tagstotranslate) artificial intelligence Co Ltd (T) Openai Labs (T) artificial intelligence (T) Computers and Internet