Whispering experts: mitigating toxicity in pre-trained language models by attenuating expert neurons
A major problem with large language models (LLMs) is their unintended ability to generate toxic language. In this work, we ...
A major problem with large language models (LLMs) is their unintended ability to generate toxic language. In this work, we ...