Note: As part of our Preparation framework, we are investing in the development of improved assessment methods for ai-based security risks. We believe these efforts would benefit from broader input and that sharing methods could also be valuable to the ai risk research community. To this end, we present some of our first works, today focused on biological risk. We look forward to receiving feedback from the community and sharing more of our ongoing research.
Background. As OpenAI and other model developers build more capable ai systems, the potential for both beneficial and harmful uses of ai will grow. One potentially harmful use, highlighted by researchers and policymakers, is the ability of artificial intelligence systems to help malicious actors create biological threats (e.g., see White House 2023, ai-powered-biological-warfare-biggest-issue-former/” rel=”noopener noreferrer” target=”_blank”>Lovelace 2022, Sandbrink 2023). In a hypothetical example discussed, a malicious actor could use a highly capable model to develop a step-by-step protocol, troubleshoot wet lab procedures, or even autonomously execute steps in the biothreat creation process when given access. to tools like cloud labs (see Carter et al., 2023). However, evaluation of the feasibility of such hypothetical examples was limited by insufficient evaluations and data.
Following our recently shared Preparedness Framework, we are developing methodologies to empirically assess these types of risks, to help us understand where we are today and where we could be in the future. Here, we detail a new assessment that could help serve as a potential “tripwire” signaling the need for caution and additional testing of the potential for biologic misuse. This evaluation aims to measure whether the models could significantly increase malicious actors' access to dangerous information about creating biological threats, compared to the baseline of existing resources (i.e., the Internet).
To evaluate this, we conducted a study with 100 human participants, comprising (a) 50 biology experts with PhDs and professional wet lab experience and (b) 50 student-level participants, with at least one undergraduate course in biology. Each group of participants was randomly assigned to a control group, which only had access to the Internet, or a treatment group, which had access to GPT-4 in addition to the Internet. Each participant was then asked to complete a series of tasks covering aspects of the end-to-end process for creating biothreats.(^1) To our knowledge, this is the largest human assessment to date of ai's impact on biorisk information.
Recommendations. Our study evaluated performance improvements for participants with access to GPT-4 across five metrics (accuracy, completeness, innovation, time taken, and self-assessed difficulty) and five stages in the biothreat creation process (ideation, acquisition, scale-up, formulation). , and release). We found slight improvements in accuracy and completeness for those who had access to the language model. Specifically, on a 10-point scale measuring response accuracy, we observed an increase in mean score of 0.88 for experts and 0.25 for students compared to the Internet-only baseline, and improvements similar in terms of integrity (0.82 for experts and 0.41 for students). However, the effect sizes obtained were not large enough to be statistically significant, and our study highlighted the need for further research into what performance thresholds indicate a significant increase in risk. Furthermore, we note that access to information alone is insufficient to create a biological threat, and that this assessment does not prove success in physically constructing threats.
Below, we share our evaluation procedure and the results it yielded in more detail. We also discuss several methodological ideas related to the capabilities attainment and security considerations necessary to execute this type of assessment with scaled frontier models. We also discuss the limitations of statistical significance as an effective method for measuring model risk and the importance of further research to assess the significance of model evaluation results.