To help developers protect their apps from potential misuse, we’re introducing the fastest and most accurate moderation endpoint. This endpoint provides OpenAI API developers with free access to GPT-based classifiers that detect unwanted content, an instance of using AI systems to help with human monitoring of these systems. We have also released a technical document describing our methodology and the data set used for evaluation.
When given a text input, the moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm, content prohibited by our content policy. The endpoint has been trained to be fast, accurate, and to perform robustly in a variety of applications. Importantly, this reduces the chances of products “saying” the wrong thing, even when deployed to users at scale. As a consequence, AI can unlock benefits in sensitive environments, such as education, where it could not otherwise be used with confidence.