OpenAI is expanding its internal security processes to defend against the threat of harmful ai. A new “security advisory group” will sit above the technical teams and make recommendations to leaders, and the board has been given veto power; Of course, whether you'll actually use it is another question entirely.
Typically, the ins and outs of policies like these don't need coverage, as in practice they amount to a bunch of closed-door meetings with obscure roles and streams of responsibility that outsiders will rarely be privy to. While that's likely true here as well, the recent leadership fight and evolving debate over ai risks warrant a look at how the world's leading ai development company is addressing security considerations .
in a new document and blog postOpenAI discusses its updated “Readiness Framework,” which one imagines received a bit of a shakeup after the November reorganization that removed the two most “slowdownist” board members: Ilya Sutskever (still at the company in a somewhat changed) and Helen. Toner (completely exhausted).
The main goal of the update seems to be to show a clear path to identify, analyze and decide what to do about the “catastrophic” risks inherent in the models they are developing. This is how they define it:
By catastrophic risk we mean any risk that could generate hundreds of billions of dollars in economic damage or cause serious harm or death to many people; This includes, among others, existential risk.
(The existential risk is something along the lines of “rise of the machines”).
Models in production are governed by a team of “safety systems”; this is for, say, systematic abuses of ChatGPT that can be mitigated with restrictions or API tweaks. Frontier models in development have the “preparation” team, which attempts to identify and quantify risks before the model is released. And then there is the “superalignment” team, which is working on theoretical guides for “superintelligent” models, which we may or may not be close to.
The first two categories, being real and not fictitious, have a relatively easy to understand rubric. Their teams rate each model in four risk categories: cybersecurity, “persuasion” (e.g., disinformation), model autonomy (i.e., acting on its own), and CBRN (chemical, biological, radiological and nuclear threats, e.g. the ability to create new pathogens). ).
Various mitigations are assumed: for example, a reasonable reluctance to describe the process of making napalm or homemade bombs. After taking into account known mitigations, if a model is still assessed as having a “high” risk, it cannot be deployed, and if a model has “critical” risks, it will not be developed further.
These risk levels are actually documented in the framework, in case you're wondering if they should be left to the discretion of any engineer or product manager.
For example, in the cybersecurity section, which is the most practical of them, it is a “medium” risk to “increase the productivity of operators… in key cyber operation tasks” in a certain factor. On the other hand, a high-risk model would “identify and develop proofs of concept for high-value exploits against protected targets without human intervention.” Crucially, “the model can devise and execute novel end-to-end strategies for cyber attacks against protected targets, as long as only a high-level desired objective is achieved.” Obviously we don't want that to come to light (although it would sell for a considerable sum).
I've asked OpenAI for more information on how these categories are defined and refined, such as whether a new risk, such as photorealistic deepfake video of people, falls under “persuasion” or a new category, and will update this post if I hear back.
Therefore, one way or another only medium and high risks should be tolerated. But the people who make those models aren't necessarily the best ones to evaluate them and make recommendations. For that reason, OpenAI is creating a “cross-functional Security Advisory Group” that will sit at the top of the technical side, review expert reports, and make recommendations that include superior benefit. Hopefully (they say) this will uncover some “unknown unknowns”, although by their nature they are quite difficult to detect.
The process requires that these recommendations be sent simultaneously to the board of directors and leadership, which we understand to mean CEO Sam Altman and CTO Mira Murati, as well as their lieutenants. Leadership will make the decision on whether to send him or freeze him, but the board will be able to reverse those decisions.
This will hopefully short-circuit something similar to what was rumored to have happened before the big drama: a high-risk product or process getting the green light without the board's knowledge or approval. Of course, the result of said drama was the sidelining of two of the most critical voices and the appointment of some money-minded guys (Bret Taylor and Larry Summers) who are smart but nowhere near experts in artificial intelligence.
If a panel of experts makes a recommendation and the CEO decides based on that information, will this friendly board really feel empowered to contradict them and hold them back? And if they do, will we find out? Transparency is not actually addressed beyond the promise that OpenAI will request independent third-party audits.
Let's say a model is developed that guarantees a “critical” risk category. OpenAI hasn't been shy about bragging about this sort of thing in the past: talking about how tremendously powerful its models are, to the point of refusing to release them, is great publicity. But do we have any kind of guarantee that this will happen if the risks are so real and OpenAI is so worried about them? Maybe it's a bad idea. But either way, it's not really mentioned.