This article was accepted at the 5th Workshop on Gender Bias in Natural Language Processing 2024.
Machine translation (MT) systems often translate gender-ambiguous terms (e.g., the English term “the nurse”) into the gendered form prevalent in the systems’ training data (e.g., “enfermera,” the Spanish term for a female nurse). This often reflects and perpetuates harmful stereotypes present in society. With MT user interfaces in mind that allow gender ambiguity to be resolved in a frictionless manner, we study the problem of generating all grammatically correct gendered translation alternatives. We open-source training and test datasets for five language pairs and establish benchmarks for this task. Our key technical contribution is a novel semi-supervised solution for generating alternatives that seamlessly integrates with standard MT models and maintains high performance without requiring additional components or increasing inference overhead.