Britain’s data watchdog has issued a warning to tech companies about using people’s personal information to develop chatbots after concerns that the underlying technology is trained on vast amounts of raw material culled from the Web.
The intervention by the Information Commissioner’s Office came after its Italian counterpart temporarily banned ChatGPT over data privacy concerns.
The ICO said that companies that develop and use chatbots must respect people’s privacy when building generative artificial intelligence systems. ChatGPT, the best-known example of generative AI, is based on a system called the Large Language Model (LLM) that “trains” itself by receiving a large amount of data pulled from the Internet.
“There really can be no excuse for misunderstanding the privacy implications of generative AI. We will work hard to make sure organizations get it right,” said Stephen Almond, ICO’s chief technology and innovation officer.
In a blog post, Almond pointed to Italy’s decision and a letter signed by academics last week, including Elon Musk and Apple co-founder Steve Wozniak, calling for an immediate pause on creating “giant AI experiments” during at least six. months. The letter said there were concerns that tech companies were creating “increasingly powerful digital minds” that no one could “reliably understand, predict or control.”
Almond said his own conversation with ChatGPT had led to the chatbot telling him that generative AI had “the potential to pose data privacy risks if not used responsibly.” He added: “It doesn’t take much imagination to see the potential for a company to rapidly damage a hard-earned relationship with customers through the misuse of generative AI.”
Referring to the LLM training process, Almond said that data protection law still applies when the personal information being processed comes from publicly accessible sources.
A checklist published by the ICO on Monday indicated that under the UK’s General Data Protection Regulation (GDPR), there must be a legal basis for processing personal data, such as a person giving “clear consent” for your data is used. There were also other alternatives that did not require consent, such as having a “legitimate interest,” the checklist said.
He added that companies had to carry out a data protection impact assessment and mitigate security risks, such as personal data leaks and so-called membership inference attacks, whereby rogue actors try to identify whether a a certain person was used in the training data for an LLM.
The Italian data protection watchdog announced a temporary ban on ChatGPT on Friday, citing a data leak last month and concerns about the use of personal data in the system that underpins the chatbot. The watchdog said there appeared to be “no legal basis to support the massive collection and processing of personal data to ‘train’ the algorithms on which the platform was based.”
In response to the Italian ban, Sam Altman, CEO of OpenAI, developer of ChatGPT, said: “We believe we are following all privacy laws.” But the company has refused to share information about what data was used to train GPT-4, the latest version of the underlying technology that powers ChatGPT.
The previous version, GPT-3, was trained on 300 billion words pulled from the public Internet, as well as the content of millions of e-books and the entire English Wikipedia.