So essentially, Language model tuned to avoid toxic language in English will also avoid it in other languages! Paper: Preference Tuning For Toxicity Mitigation Generalizes Across Languages (19 Pages) Researchers from Brown University want to make language models less toxic. This study discovered that preference tuning with Direct Preference Optimization (DPO) using only English training data significantly reduces the toxicity level in LLMs' generations across 17 different languages such as Spanish, Russian, Chinese and Korean.
How to DETOX a language? ☣️⚠️🤖
How to DETOX a language? ☣️⚠️🤖
How to DETOX a language? ☣️⚠️🤖
So essentially, Language model tuned to avoid toxic language in English will also avoid it in other languages! Paper: Preference Tuning For Toxicity Mitigation Generalizes Across Languages (19 Pages) Researchers from Brown University want to make language models less toxic. This study discovered that preference tuning with Direct Preference Optimization (DPO) using only English training data significantly reduces the toxicity level in LLMs' generations across 17 different languages such as Spanish, Russian, Chinese and Korean.