Journal of Software:2013.24(1):25-36

(中南大学 信息科学与工程学院,湖南 长沙 410083;先进控制与智能自动化湖南省工程实验室,湖南 长沙 410083;深圳职业技术学院 教育技术与信息中心,广东 深圳 518055;School of Computer Science, Tokyo University of Technology, Tokyo 192-0982, Japan)
Point-Based Online Value Iteration Algorithm for POMDPs
WU Bo,WU Min,SHE Jin-Hua
(School of Information Science and Engineering, Central South University, Changsha 410083, China;Hu'nan Engineering Laboratory for Advanced Control and Intelligent Automation, Changsha 410083, China;Education Technology and Information Center, Shenzhen Polytechnic, Shenzhen 518055, China;School of Computer Science, Tokyo University of Technology, Tokyo 192-0982, Japan)
Chart / table
Similar Articles
Article :Browse 4040   Download 3723
Received:February 03, 2012    Revised:May 18, 2012
> 中文摘要: 部分可观察马尔可夫决策过程(partially observable Markov decision processes,简称POMDPs)是动态不确定环境下序贯决策的理想模型,但是现有离线算法陷入信念状态“维数灾”和“历史灾”问题,而现有在线算法无法同时满足低误差与高实时性的要求,造成理想的POMDPs模型无法在实际工程中得到应用.对此,提出一种基于点的POMDPs在线值迭代算法(point-based online value iteration,简称PBOVI).该算法在给定的可达信念状态点上进行更新操作,避免对整个信念状态空间单纯体进行求解,加速问题求解;采用分支界限裁剪方法对信念状态与或树进行在线裁剪;提出信念状态结点重用思想,重用上一时刻已求解出的信念状态点,避免重复计算.实验结果表明,该算法具有较低误差率、较快收敛性,满足系统实时性的要求.
Abstract:Partially observable Markov decision processes (POMDPs) provide a rich framework for sequential decision-making in stochastic domains of uncertainty. However, solving POMDPs is typically computationally intractable because the belief states of POMDPs have two curses: Dimensionality and history, and online algorithms that can not simultaneously satisfy the requirement of low errors and high timeliness. In order to address these problems, this paper proposes a point-based online value iteration (PBOVI) algorithm for POMDPs. This algorithm for speeding up POMDPs solving involves performing value backup at specific reachable belief points, rather than over the entire a belief simplex. The paper exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online and proposes a novel idea to reuse the belief states that have been computed last time to avoid repeated computation. The experiment and simulation results show that the proposed algorithm has its effectiveness in reducing the cost of computing policies and retaining the quality of the policies, so it can meet the requirement of a real-time system.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61074058, 60874042); 国家教育部博士点基金(20090162120068) 国家自然科学基金(61074058, 60874042); 国家教育部博士点基金(20090162120068)
Foundation items:
Reference text:


WU Bo,WU Min,SHE Jin-Hua.Point-Based Online Value Iteration Algorithm for POMDPs.Journal of Software,2013,24(1):25-36