Bias Mitigation Techniques in Large Language Models
DOI:
https://doi.org/10.61173/4yvqsc19Keywords:
Large language models (LLMs), Prejudice and fairness, Bias mitigation techniquesAbstract
Large scale language models (LLMs) demonstrate outstanding performance and enormous potential for development, and are widely applied in people’s real-life situations. However, social bias can be learned by LLM in unprocessed training data and transmitted to downstream tasks, resulting in adverse social effects and potential harm. In this article, we present a survey of bias and fairness research on Large Language Models (LLMs), categorizing the metrics and datasets used for bias assessment. Based on the elements used by the metrics in the model, they are refined into embeddings, probabilities, and generated text. The dataset is then divided into counterfactual inputs or prompts based on its structure. Afterwards, this article conducts research and organization on bias mitigation techniques based on different intervention stages: preprocessing (modifying model inputs), in-training (modifying optimization processes), intra-processing (modifying inference behavior), and post-processing (modifying model outputs). Finally, this study aims to explore in depth the key challenges that affect the fair development of large language models, and to look forward to their future evolution paths.