Character Pose Generation and Editing Method Based on Large Model and Multimodal Prompt Words

Authors

  • Boxiao Xu Author

DOI:

https://doi.org/10.61173/8wq6d604

Keywords:

Pose editing, Large model, Multi-modal

Abstract

 

Character pose generation and editing is an important technology for character shaping in the fields of film, animation, and game character development. The current mainstream posture editing methods usually focus on text and image input, but overlook the equally critical audio modality; Meanwhile, existing technologies generally cannot run on consumer grade graphics cards. Therefore, this paper proposes a new technical approach: using a large model as a multimodal processor, a stable diffusion model as a transition device to generate four views after attitude changes, and finally using 3D Gaussian multi view generation models to generate the final model. Then, regarding the fusion of audio modalities, this paper introduces the “persona mask” mechanism to set a unified audio analysis method for the large model in advance, achieving accurate recognition of character emotions from speech semantics, and synchronously mapping emotional features to character actions. During this process, all models are called using interfaces. Through experiments, it can generate poses that match the character design very well on the 4070 graphics card. Hope this technology can be integrated with skeletal animation in the future, so as to complete the editing of character action animations.

Downloads

Published

2025-12-19

Issue

Section

Articles