国家自然科学基金面上项目(62076259, 62073176); 国家自然科学联合基金重点项目(U1908214); 科技创新2030—新一代人工智能重大项目(2021ZD0112400)
近年来, 基于环境交互的强化学习方法在机器人相关应用领域取得巨大成功, 为机器人行为控制策略优化提供一个现实可行的解决方案. 但在真实世界中收集交互样本存在高成本以及低效率等问题, 因此仿真环境被广泛应用于机器人强化学习训练过程中. 通过在虚拟仿真环境中以较低成本获取大量训练样本进行策略训练, 并将学习策略迁移至真实环境, 能有效缓解真实机器人训练中存在的安全性、可靠性以及实时性等问题. 然而, 由于仿真环境与真实环境存在差异, 仿真环境中训练得到的策略直接迁移到真实机器人往往难以获得理想的性能表现. 针对这一问题, 虚实迁移强化学习方法被提出用以缩小环境差异, 进而实现有效的策略迁移. 按照迁移强化学习过程中信息的流动方向和智能化方法作用的不同对象, 提出一个虚实迁移强化学习系统的流程框架, 并基于此框架将现有相关工作分为3大类, 分别是: 基于真实环境的模型优化方法、基于仿真环境的知识迁移方法、基于虚实环境的策略迭代提升方法, 并对每一分类中的代表技术与关联工作进行阐述. 最后, 讨论虚实迁移强化学习研究领域面临的机遇和挑战.
In recent years, reinforcement learning methods based on environmental interactions have achieved great success in robotic applications, providing a practical and feasible solution for optimizing the behavior control strategies of robots. However, collecting interactive samples in the real world can lead to problems such as high cost and low efficiency. Therefore, the simulation environment is widely used in the training process of robot reinforcement learning. By obtaining a large number of training samples at a low cost in the virtual simulation environment for strategy training and transferring learning strategies to the real world, the security, reliability, and real-time problems in the real robot training process can be alleviated. However, due to the difference between the simulation environment and the real environment, it is often difficult to obtain ideal performance when directly transferring the strategy trained in the simulation environment to the real robot. To solve this problem, sim-to-real transfer reinforcement learning methods are proposed to reduce the environmental gap, so as to achieve effective strategy transfer. According to the direction of information flow in the process of transfer reinforcement learning and the different objects targeted by intelligent methods, this survey first proposes a sim-to-real transfer reinforcement learning framework, based on which the existing related work is then divided into three categories: the model optimization methods focusing on the real environment, the knowledge transfer methods focusing on the simulation environment, and the iterative policy promotion methods focusing on both simulation and real environments. Then, the representative technologies and related work in each category are described. Finally, the opportunities and challenges in this field are briefly discussed.