Abstract:The rapid development of generative technologies has revealed the potential for real-world applications of related technologies. The core objective of Pose Guided Person Image Generation (PGPIG) is to transform an input human image into a specified pose while maintaining a high level of appearance consistency. This technology can be widely applied in various fields such as virtual try-on and fashion, video generation and editing in advertising, and multimodal content generation, driving advancements in user experience and technological innovation. However, despite significant progress, the technology still faces multiple challenges, including effective extraction and rearrangement of appearance information during pose transfer, generation of unseen information, consistency preservation, and efficient model training and deployment. Based on the existing challenges, this paper provides a detailed analysis of the strategies employed by current mainstream pose-guided generation methods to address these issues, discussing their feasibility and limitations in practical applications. Additionally, the paper explores the commonly used generative models and pose representation methods in pose-guided generation. It also reviews the datasets, their sizes, characteristics, and evaluation benchmarks used in this field. Furthermore, the paper discusses the applications of this technology in virtual try-on, video generation and editing, and multimodal content generation. It highlights the remaining challenges, such as the retention of personalized information, generation in complex scenes, and model efficiency and real-time performance. Finally, the paper discusses potential future development trends of pose-guided generation technology, aiming to provide researchers with a systematic summary and reference to promote its application and innovation across industries.