Abstract:In recent years, deep neural networks have achieved significant progress across various domains. However, as typical black-box models, their internal mechanisms remain difficult for humans to understand, posing serious challenges in high-stakes applications such as medical diagnosis, financial risk management, and autonomous driving. Enhancing model interpretability has become one of the core issues in building highly trustworthy machine learning systems. Existing interpretability methods can be broadly classified into two categories: information-flow-based explanations, which focus on analyzing the importance of neurons or features to reveal what the model "attended to," often fail to provide cognitively meaningful, human-understandable semantics. In contrast, concept-based explanations construct semantic spaces to map internal model representations to interpretable concept structures, thereby answering what the model has understood. These methods offer greater semantic depth and cognitive alignment, making them especially effective in improving semantic transparency and user trust. The fundamental lack of interpretability in deep learning stems from its deficiency in semantic representation. Therefore, constructing concept spaces and representation mechanisms aligned with human cognition has become a key breakthrough point in the development of explainable models. This paper presents a comprehensive survey of concept-based modeling methods in explainable deep learning. Based on the stage at which interpretability is introduced, existing approaches are categorized into two major paradigms: post-hoc explanations, which extract semantic representations from trained models through techniques such as neuron dissection and semantic clustering; and intrinsic explanations, which incorporate structured priors or semantic constraints during training to endow models with built-in interpretability. Within this classification framework, this survey systematically reviews representative modeling strategies and key methods, compares their performance in terms of semantic transparency and practical applicability, and summarizes current challenges and future research directions. The goal is to provide a structured reference and methodological guidance for understanding and building semantically interpretable deep learning models.