Abstract:Fuzz testing techniques play a significant role in software quality assurance and software security testing. However, when dealing with systems like compilers that have complex input semantics, existing fuzz testing tools often struggle as a lack of semantic awareness in their mutation strategies leads to the generated programs failing to pass compiler frontend checks. This study proposes a semantically-aware greybox fuzz testing method, aiming at enhancing the efficiency of fuzz testing tools in the domain of compiler testing. It designs and implements a series of mutation operators that can maintain input semantic validity and explore contextual diversity, and develops efficient selection strategies according to the characteristics of these operators. The greybox fuzz testing tool SemaAFL is developed by integrating these strategies with traditional greybox fuzz testing tools. Experimental results indicate that by applying these mutation operators, SemaAFL achieves approximately 14.5% and 11.2% higher code coverage on GCC and Clang compilers compared to AFL++ and similar tools like GrayC. During a week-long experimental period, six previously unknown bugs in GCC and Clang are discovered and reported by SemaAFL.