Bias Mitigation Techniques in Large Language Models

Junran  Xue

doi:10.61173/4yvqsc19

Authors

Junran Xue Author

DOI:

https://doi.org/10.61173/4yvqsc19

Keywords:

Large language models (LLMs), Prejudice and fairness, Bias mitigation techniques

Abstract

Large scale language models (LLMs) demonstrate outstanding performance and enormous potential for development, and are widely applied in people’s real-life situations. However, social bias can be learned by LLM in unprocessed training data and transmitted to downstream tasks, resulting in adverse social effects and potential harm. In this article, we present a survey of bias and fairness research on Large Language Models (LLMs), categorizing the metrics and datasets used for bias assessment. Based on the elements used by the metrics in the model, they are refined into embeddings, probabilities, and generated text. The dataset is then divided into counterfactual inputs or prompts based on its structure. Afterwards, this article conducts research and organization on bias mitigation techniques based on different intervention stages: preprocessing (modifying model inputs), in-training (modifying optimization processes), intra-processing (modifying inference behavior), and post-processing (modifying model outputs). Finally, this study aims to explore in depth the key challenges that affect the fair development of large language models, and to look forward to their future evolution paths.

Bias Mitigation Techniques in Large Language Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section