We are clarifying how ChatGPT behavior is configured and our plans to improve that behavior, enable more user personalization, and obtain more public information about our decision-making in these areas.
OpenAI’s mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. So we think a lot about the behavior of the AI systems we built in the lead up to AGI and how that behavior is determined.
Since our launch of ChatGPT, users have shared results that they find to be politically biased, offensive, or objectionable. In many cases, we feel that the concerns raised have been valid and have exposed real limitations of our systems that we want to address. We’ve also seen some misconceptions about how our systems and policies work together to shape the results you get from ChatGPT.
Here we summarize:
- How ChatGPT’s behavior is shaped;
- How we plan to improve the default behavior of ChatGPT;
- Our intention to allow more customization of the system; and
- Our efforts to obtain more public information about our decision making.
where are we today
Unlike ordinary software, our models are massive neural networks. Their behaviors are learned from a wide range of data, not explicitly programmed. Although not a perfect analogy, the process is more similar to training a dog than ordinary programming. First comes an initial “pretraining” phase, in which the model learns to predict the next word in a sentence, informed by its exposure to many Internet texts (and a wide range of perspectives). This is followed by a second phase in which we “tune” our models to reduce the behavior of the system.
To this day, this process is imperfect. Sometimes the tuning process falls short of our intent (producing a safe and useful tool) or the user’s intent (getting a useful result in response to given input). Improving our methods for aligning AI systems with human values is a top priority for our company, particularly as AI systems become more capable.
A two-step process: Pre-training and taper
The two main steps involved in building ChatGPT work as follows:
- First U.S “pre-workout” by having them predict what comes next on a large data set that contains parts of the internet. They could learn to complete the sentence “instead of turning left, he turned ___.” By learning from billions of sentences, our models learn grammar, lots of facts about the world, and some reasoning skills. They also learn some of the biases present in those billions of sentences.
- Then we “fine tune” these models on a more limited dataset that we carefully generate with human reviewers who follow the guidelines we provide. Since we cannot predict all possible entries that future users may put into our system, we do not write detailed instructions for every entry that ChatGPT will find. Instead, we describe some categories in the guidelines that our reviewers use to review and rate potential model outputs for a variety of sample inputs. Then, while in use, the models are generalized from this reviewer’s feedback to respond to a wide range of specific input provided by a given user.
The role of reviewers and OpenAI policies in system development
In some cases, we may provide guidance to our reviewers about a certain type of output (for example, “do not complete requests for illegal content”). In other cases, the guidance we share with reviewers is of a higher level (for example, “avoid taking a position on controversial issues”). It is important to note that our collaboration with reviewers is not unique: it is an ongoing relationship, in which we learn a lot from their experience.
A big part of the fine-tuning process is maintaining a strong feedback loop with our reviewers, which involves weekly meetings to address any questions they may have or provide clarification on our guidance. This iterative feedback process is how we train the model to get better and better over time.
address biases
Many are rightly concerned about the design biases and impact of AI systems. We are committed to addressing this issue vigorously and being transparent about our intentions and progress. To that end, we share a portion of our guidelines related to political and controversial issues. Our guidelines are explicit that reviewers should not favor any political group. The biases that can arise from the process described above, however, are errors, not features.
While there will always be disagreements, we hope that sharing this blog post and these instructions will provide more insight into how we view this critical aspect of such a fundamental technology. We believe that tech companies should be held accountable for producing policies that stand up to scrutiny.
We’re always working to improve the clarity of these guidelines, and based on what we’ve learned from the ChatGPT release thus far, we’ll provide clearer guidance to reviewers on potential bias-related pitfalls and challenges, as well as controversial characters and topics. . Additionally, as part of ongoing transparency initiatives, we are working to share aggregated demographic information about our reviewers in a way that does not violate privacy rules and regulations, as this is an additional source of potential bias in system results. .
We are currently investigating how to make the fine-tuning process more understandable and controllable, building on external advances such as rules based rewards and constitutional AI.
Where We Are Going: The Building Blocks of Future Systems
In pursuit of our mission, we are committed to ensuring that access to, benefits from, and influence over AI and AGI are widespread. We believe that at least three basic components are required to achieve these goals in the context of AI system behavior.
1. Improve default behavior. We want as many users as possible to find our AI systems useful to them “out of the box” and to feel that our technology understands and respects their values.
To that end, we’re investing in research and engineering to reduce glaring and subtle biases in the way ChatGPT responds to different inputs. In some cases, ChatGPT actually rejects output that it shouldn’t, and in some cases doesn’t reject when it should. We believe that it is possible to improve in both aspects.
In addition, we have room for improvement in other dimensions of system behavior, such as the system “making things up”. User feedback is invaluable in making these improvements.
2. Define your AI values, within broad limits. We believe that AI should be a useful tool for individual people and therefore customizable by each user up to the limits defined by society. Therefore, we are developing an update to ChatGPT to allow users to easily customize their behavior.
This will mean allowing system outputs that other people (including ourselves) may strongly disagree with. Striking the right balance here will be challenging: taking personalization to the extreme would risk enabling malicious uses of our technology and fawning AIs that mindlessly amplify people’s existing beliefs.
Therefore, there will always be some limits on the behavior of the system. The challenge is to define what those limits are. If we try to make all of these determinations on our own, or if we try to develop a single, monolithic AI system, we will be failing in our Charter commitment to “avoid undue concentration of power.”
3. Public input on defaults and hard limits. One way to avoid undue concentration of power is to give people who use or are affected by systems like ChatGPT the ability to influence the rules of those systems.
We believe that many decisions about our defaults and hard limits need to be made collectively, and while practical implementation is challenging, our goal is to include as many perspectives as possible. As a starting point, we seek external information about our technology in the form of Red Team. We also recently began soliciting public input on AI in education – a particularly important context in which our technology is being deployed.
We are in the early stages of pilot efforts to solicit public input on issues such as system behavior, disclosure mechanisms (such as watermarking), and our implementation policies in general. We are also exploring partnerships with external organizations to conduct third-party audits of our security efforts and policies.
Conclusion
Combining the three building blocks above gives the following picture of where we are headed:
Sometimes we will make mistakes. When we do, we will learn from them and iterate on our models and systems.
We thank the ChatGPT user community as well as the general public’s watchdog for holding us accountable, and we’re excited to share more about our work in the above three areas in the coming months.
If you are interested in conducting research to help achieve this vision, including but not limited to research on equity and representation, alignment, and sociotechnical research to understand the impact of AI on society, please apply for grant access to our API through the Researcher Access Program.
We are also hiring for positions in Research, Alignment, Engineering and more.