We are exploring the use of LLM to address these challenges. Our large language models, such as GPT-4, can understand and generate natural language, making them applicable to content moderation. Models can make moderation judgments based on policy guidelines provided to them.
With this system, the process of developing and customizing content policies is reduced from months to hours.
- Once a policy guideline is written, policy experts can create a valuable data set by identifying a small number of examples and assigning labels to them according to the policy.
- GPT-4 then reads the policy and assigns labels to the same data set, without seeing the responses.
- By examining discrepancies between GPT-4’s judgments and those of a human, policy experts can ask GPT-4 to present reasoning behind its labels, analyze ambiguity in policy definitions, resolve confusion, and please provide further clarifications in the policy accordingly. We can repeat steps 2 and 3 until we are satisfied with the quality of the policy.
This iterative process produces refined content policies that are translated into classifiers, enabling policy implementation and content moderation at scale.
Optionally, to handle large amounts of data at scale, we can use GPT-4 predictions to fit a much smaller model.