Abstract:Deep learning (DL) systems have powerful learning and reasoning capabilities and are widely used in many fields, such as unmanned vehicles, speech processing, intelligent robotics, and etc. Due to the limited dataset and the dependence on manually labeled data, DL systems often fail to detect erroneous behaviors. Accordingly, the quality of DL systems have received widespread attention, especially in safety-critical fields. Due to that fuzzing shows efficient fault-detecting ability in traditional programs, in recent years, it becomes a hot research field to employ fuzzing to test DL systems. In this study, we present a systematic review of fuzzing for DL systems, focusing on test case generation (including seed queue construction, seed selection, and seed mutation), test result determination, and coverage analysis, and then introduce commonly used data sets and metrics. We also discuss issues and opportunities in future researches of this field.