Abstract:Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through information retrieval based code search techniques. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the problem, this paper proposes a software knowledge graph based approach (called KGCodeTagger) that automatically generates semenatic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. We evaluate the software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags that can help developers quickly understand the intention of the code.