Shaping Reward Learning Approach from Passive Samples
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Reinforcement learning (RL) deals with long-term reward maximization problems via learning correct short-term decisions from on previous experience. It has been revealed that reward shaping, which provides simpler and easier reward functions to replace the actual environmental reward, is an effective way to guide and accelerate reinforcement learning. However, building a shaping reward requires either domain knowledge or demonstrations from an optimal policy, both involve participation of human experts that is costly. This work investigates whether it is possible to automatically learn a better shaping reward along with an RL process. RL algorithms commonly sample a lot of trajectories throughout the learning process. Those passive samples, though containing many failed attempts, may provide useful information for building a shaping reward function. A policy-invariance condition for reward shaping is introduced as a more effective way to handle noisy examples, followed by the RFPotential approach to learn a shaping reward from massive examples efficiently. Empirical studies on various RL algorithms and domains show that RFPotential can accelerate the RL process.

    Reference
    Related
    Cited by
Get Citation

钱煜,俞扬,周志华.一种基于自生成样本学习的奖赏塑形方法.软件学报,2013,24(11):2667-2675

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 06,2013
  • Revised:July 17,2013
  • Adopted:
  • Online: November 01,2013
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063