The Comprehensive Review on Prompt Injection Attacks and Defense Mechanisms in Large Language Models

Qingtian Wang

doi:10.61173/390f5h97

Authors

Qingtian Wang Author

DOI:

https://doi.org/10.61173/390f5h97

Keywords:

Large Language Models, Prompt Injection Attacks, Defense Mechanisms, GCG Algorithm, Semantic Manipulation, Resource Exploitation, Adaptive Defense, Cybersecurity

Abstract

This review analyzes prompt injection attacks in large language models (LLMs) from 2019 to 2025, addressing critical security challenges as models like ChatGPT proliferate across sectors. We synthesize advances in detection, classification, and mitigation strategies, proposing a tripartite framework categorizing attacks by vector (text/image/speech), mechanism (semantic manipulation, resource exploitation), and impact (data breaches, privacy theft). Key attack vectors include the GCG algorithm, DAN jailbreaks, and resource-exhaustion tactics (e.g., Engorgio). Current defenses are evaluated for efficacy, highlighting scalability gaps and trade-offs between security and model utility. Future priorities include adaptive defense systems leveraging reinforcement learning, interdisciplinary collaboration to address ethical-technical intersections, and open threat intelligence networks for proactive vulnerability management. This work equips researchers and practitioners with actionable strategies to secure LLM ecosystems against evolving adversarial threats.

The Comprehensive Review on Prompt Injection Attacks and Defense Mechanisms in Large Language Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section