显式知识推理和深度强化学习结合的动态决策
作者:
作者单位:

作者简介:

通讯作者:

伍楷舜,E-mail:wu@szu.edu.cn;林方真,E-mail:flin@cse.ust.hk

中图分类号:

TP311

基金项目:

国家自然科学基金(61806132, U2001207, 61872248); 广东省自然科学基金(2017A030312008); 深圳市自然科学基金(ZDSYS20190902092853047, R2020A045); 珠江人才计划项目(2019ZT08X603); 广东省普通高校创新团队项目(2019KCXTD005)


Dynamic decision making framework based on explicit knowledge reasoning and deep reinforcement learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 深度强化学习在序列决策领域被广泛应用并且效果良好, 尤其在具有高维输入、大规模状态空间的应用场景中优势明显.然而, 深度强化学习相关方法也存在一些局限, 如缺乏可解释性、初期训练低效与冷启动等问题.针对这些问题, 我们提出了一种基于显式知识推理和深度强化学习的动态决策框架, 将显式的知识推理与深度强化学习结合.该框架通过显式知识表示将人类先验知识嵌入智能体训练中, 让智能体在强化学习中获得知识推理结果的干预, 以提高智能体的训练效率, 并增加模型的可解释性.本文中的显式知识分为两种, 即启发式加速知识与规避式安全知识.前者在训练初期干预智能体决策, 加快训练速度; 而后者将避免智能体作出灾难性决策, 使其训练过程更为稳定.实验表明, 该决策框架在不同强化学习算法上、不同应用场景中明显提高了模型训练效率, 并增加了模型的可解释性.

    Abstract:

    In recent years, deep reinforcement learning has been widely used in sequential decision making. The approach works well in many applications, especially in those scenarios with high-dimensional input and large state spaces. However, there are some limitations of these deep reinforcement learning methods, such as lack of interpretability, inefficient initial training, cold start, etc. In this paper, we propose a framework combining explicit knowledge reasoning with deep reinforcement learning, to alleviation the above problems. The framework successfully leverages high-level priori knowledge in the deep learning process via explicit knowledge representation, resulting in improvement of the training efficiency and the interpretability. The explicit knowledge is categorized into two kinds, namely, acceleration knowledge and safety knowledge. The former intervenes in the training, especially at the early stage, to speed up the learning process, while the latter keeps the agent from catastrophic actions to keep it safe. Our experiments in several domains with several baselines show that the proposed framework significantly improves the training efficiency and the interpretability, and the improvement is general for different reinforcement learning algorithms and different scenarios.

    参考文献
    相似文献
    引证文献
引用本文

张昊迪,陈振浩,陈俊扬,周熠,连德富,伍楷舜,林方真.显式知识推理和深度强化学习结合的动态决策.软件学报,,():0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-09-05
  • 最后修改日期:2021-10-14
  • 录用日期:
  • 在线发布日期: 2022-01-28
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号