Introducing a new and improved content moderation tool: The moderation endpoint improves our previous content filter and is freely available today to OpenAI API developers.
To help developers protect their apps from potential misuse, we’re introducing the fastest and most accurate moderation endpoint. This endpoint provides OpenAI API developers with free access to GPT-based classifiers that detect unwanted content, an instance of using AI systems to help with human monitoring of these systems. We have also released a technical document describing our methodology and the data set used for evaluation.
When given a text input, the moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm, content prohibited by our content policy. The endpoint has been trained to be fast, accurate, and to perform robustly in a variety of applications. Importantly, this reduces the chances of products “saying” the wrong thing, even when deployed to users at scale. As a consequence, AI can unlock benefits in sensitive environments, such as education, where it could not otherwise be used with confidence.
Violence
self harm
Hatred
Sexual
End point of moderation
The moderation endpoint helps developers benefit from our infrastructure investments. Rather than building and maintaining your own classifiers, an extensive process, as we document in our paperInstead, they can access precise classifiers through a single API call.
As part of OpenAI’s commitment to making the AI ecosystem more secure, we provide this endpoint to allow free moderation of all content generated by the OpenAI API. For example, In the world, an OpenAI API client, uses the moderation endpoint to help keep its AI-based virtual characters appropriate for their audiences. By leveraging OpenAI technology, Inworld can focus on its core product: creating memorable characters. We currently do not support third-party traffic monitoring.
Get started with the moderation endpoint by consulting the documentation. More details about the training process and the performance of the model are available in our paper. We have also released a evaluation data setwhich presents Common Crawl data labeled within these categories, which we hope will stimulate further research in this area.