OpenAI researchers collaborated with Georgetown University’s Center for Security and Emerging Technology and the Stanford Internet Observatory to investigate how large language models could be misused for disinformation purposes. The collaboration included a workshop in October 2021 that brought together 30 disinformation researchers, machine learning experts, and policy analysts, culminating in a co-authored report based on more than a year of research. This report outlines the threats that language models pose to the information environment if used to augment disinformation campaigns, and introduces a framework for discussing possible mitigations. Read the full report here.
As generative language models improve, they open up new possibilities in fields as diverse as health, law, education, and science. But, as with any new technology, it’s worth considering how it can be misused. In the context of recurring online influence operations—covert either misleading efforts to influence the opinions of a target audience: the document asks:
How might language models change to influence operations, and what steps can be taken to mitigate this threat?
Our work brought together different backgrounds and expertise—researchers with expertise in online disinformation campaign tactics, techniques, and procedures, as well as machine learning experts in the field of generative AI—to base our analysis on trends in both domains.
We believe it is critical to analyze the threat of AI-enabled influence operations and outline steps that can be taken. before language models are used to influence operations at scale. We hope that our research will inform policymakers who are new to the fields of AI or disinformation, and will stimulate in-depth research on potential mitigation strategies for AI developers, policymakers, and intelligence researchers. disinformation.
How could AI affect influence operations?
When researchers evaluate influence operations, they consider the actors, behaviors and content. The widespread availability of technology driven by language models has the potential to affect all three facets:
-
Actors: Language models could reduce the cost of running influence operations, making them available to new actors and types of actors. Similarly, paid advertisers who automate copy production can gain new competitive advantages.
-
Conduct: Influence operations with language models will be easier to scale, and tactics that are currently expensive (for example, generating custom content) may become cheaper. Language models can also enable new tactics to emerge, such as real-time content generation in chatbots.
-
Contents: Language model-driven copywriting tools can deliver more impactful or persuasive messages compared to propagandists, especially those who lack the necessary linguistic or cultural knowledge of their target. They can also make influencer operations less discoverable, as they repeatedly create new content without resorting to copy-paste and other notable time-saving behaviors.
Our final judgment is that the language models will be useful to propagandists and will likely transform online influence operations. Even if the most advanced models are kept private or controlled through application programming interface (API) access, propagandists will likely gravitate toward open source alternatives, and nation states may invest in the technology themselves.
critical unknowns
Many factors affect whether and to what extent language models will be used in influence operations. Our report dives into many of these considerations. For example:
- What new influencing capabilities will emerge as a side effect of well-intentioned research or commercial investment? Which actors will make significant investments in language models?
- When will easy-to-use tools for generating text be available to the public? Will it be more effective to design specific language models for influence operations, instead of applying generic models?
- Will rules be developed that discourage actors conducting AI-enabled influence operations? How will the actor’s intentions develop?
While we hope to see the spread of the technology, as well as improvements in the usability, reliability, and efficiency of language models, many questions about the future remain unanswered. Because these are critical possibilities that can change the way language models can affect influence operations, further research to reduce uncertainty is invaluable.
A framework for mitigations
In order to chart a path forward, the report establishes the key stages in channeling the operation of the linguistic model to influence. Each of these stages is a point for possible mitigations. To successfully carry out an influence operation leveraging a language model, propagandists would require that: (1) a model exists, (2) they can reliably access it, (3) they can disseminate content from the model, and ( 4) an end user is affected. Many potential mitigation strategies fall under these four steps, as shown below.
stage in the pipeline | 1. Model building | 2. Access Model | 3. Dissemination of content | 4. Formation of beliefs |
Illustrative Mitigations | AI developers build models that are more sensitive to facts. | AI vendors place more stringent usage restrictions on language models. | AI platforms and providers coordinate to identify AI content. | Institutions participate in media literacy campaigns. |
Developers spread radioactive data to make generative models discoverable. | AI vendors develop new standards around model release. | Platforms require “personality test” to post. | Developers provide consumer-focused AI tools. | |
Governments place restrictions on data collection. | AI vendors close security vulnerabilities. | Entities that rely on public feedback take steps to reduce their exposure to misleading AI content. | ||
Governments impose access controls on AI hardware. | Digital provenance standards are widely adopted. |
If there is a mitigation, is it desirable?
Just because a mitigation can reduce the threat of AI-enabled influence operations doesn’t mean it should be implemented. Some mitigations carry their own downside risks. Others may not be feasible. While we do not explicitly endorse or qualify mitigations, the document provides a set of guiding questions for policymakers and others to consider:
- Technical viability: Is the proposed mitigation technically feasible? Does it require significant changes to the technical infrastructure?
- Social Feasibility: Is mitigation feasible from a political, legal and institutional perspective? Does it require costly coordination, are key players incentivized to implement it, and is it feasible based on existing laws, regulations, and industry standards?
- downside risk: What are the potential negative impacts of mitigation and how significant are they?
- Impact: How effective would a proposed mitigation be in reducing the threat?
We hope that this framework will generate ideas for other mitigation strategies and that the guiding questions will help relevant institutions to begin to consider whether various mitigations are worth undertaking.
This report is far from the last word on AI and the future of influence operations. Our goal is to define the current environment and help set an agenda for future research. We encourage anyone interested in collaborating or discussing relevant projects to connect with us. To learn more, read the full report here.
Josh A. Goldstein(Georgetown University Center for Security and Emerging Technology)
Girish Pastry(Open AI)
Micah Musser(Georgetown University Center for Security and Emerging Technology)
Renee DiResta(Stanford Internet Observatory)
matthew gentzel(Longview Philanthropy) (work done in OpenAI)
katerina sedova(US Department of State) (work performed at Center for Security and Emerging Technology prior to government service)