Abstract:Reinforcement learning is a technique that discovers optimal strategies in a trial and error way, and has become a general method for solving environmental interaction problems. However, as a machine learning method, reinforcement learning faces an unexplainable problem in machine learning. The unexplainable problem limits applications of reinforcement learning in safety-sensitive fields, e.g., medical, military, transportation, etc., and leads to the lack of universally applicable solutions in environmental simulation and task generalization. Though a lot of works devoted to overcoming this weakness, the academic community still lacks a consistent understanding of explainable reinforcement learning. In this paper, we explore the basic problems of reinforcement learning and review existing works. To begin with, we explore the parent problem, i.e., explainable artificial intelligence, and summarizes its existing definitions. Next, we construct an interpretability theoretical system to describe the common problems of explainable reinforcement learning and explainable artificial intelligence, which discussing intelligent algorithms and mechanical algorithms, interpretation, factors that affect interpretability, and the intuitiveness of the explanation. Then, three unique problems of explainable reinforcement learning, i.e., environmental interpretation, task interpretation, and strategy interpretation, are defined based on the characteristics of reinforcement learning. After that, the latest researches on explainable reinforcement learning are reviewed, and the existing methods were systematically classified. Finally, we discuss the research directions in the future.