可解释深度学习的概念建模方法研究综述
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

本研究得到国家科技研发计划(2024YFE0202900),国家自然科学基金项目(62436001),国家自然科学基金青年基金(62406019),北京市自然科学基金青年基金(4244096),北交大人才基金(2024XKRC075)资助。


A Survey of Concept-Based Modeling Methods for Interpretable Deep Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来,深度神经网络在多个领域取得了显著进展,但其作为典型的黑盒模型,内部机制仍难以为人所理解,给医疗诊断、金融风控、自动驾驶等高风险应用场景带来了严峻挑战。提升模型的可解释性,已成为实现高可信机器学习的核心问题之一。现有可解释性方法大致可分为两类:基于信息流的解释和基于概念的解释。基于信息流的解释主要侧重于神经元或特征重要性分析,如定位图片中对分类结果起关键作用的像素区域。虽然能揭示模型“关注了什么”,但难以提供具备人类语义的认知解释;相比之下,基于概念的解释通过构建语义空间,将模型内部表示映射为可理解的概念结构,能够以“模型理解了什么”的方式提供更具语义深度和认知契合的解释,在增强语义透明性和用户信任方面展现出独特优势。深度学习的不可解释性源于其语义表达的缺失,因此,如何构建对人类认知友好的概念空间与表示机制,已成为可解释模型研究的关键突破口。本文围绕可解释深度学习中的概念建模方法展开综述,依据建模介入阶段将相关研究划分为事后解释与事中解释两大路径:前者通过神经元解剖、语义聚类等手段挖掘已有模型的概念表示,后者则在训练过程中引入结构化先验或语义约束,以实现模型的内生可解释性。基于该分类框架,本文系统梳理了典型方法的建模思路与代表性成果,比较其在语义透明性与实际应用中的性能差异,并总结当前研究面临的挑战与未来发展方向,旨在为理解和构建语义可解释的深度模型提供系统性参考与方法指引。

    Abstract:

    In recent years, deep neural networks have achieved significant progress across various domains. However, as typical black-box models, their internal mechanisms remain difficult for humans to understand, posing serious challenges in high-stakes applications such as medical diagnosis, financial risk management, and autonomous driving. Enhancing model interpretability has become one of the core issues in building highly trustworthy machine learning systems. Existing interpretability methods can be broadly classified into two categories: information-flow-based explanations, which focus on analyzing the importance of neurons or features to reveal what the model "attended to," often fail to provide cognitively meaningful, human-understandable semantics. In contrast, concept-based explanations construct semantic spaces to map internal model representations to interpretable concept structures, thereby answering what the model has understood. These methods offer greater semantic depth and cognitive alignment, making them especially effective in improving semantic transparency and user trust. The fundamental lack of interpretability in deep learning stems from its deficiency in semantic representation. Therefore, constructing concept spaces and representation mechanisms aligned with human cognition has become a key breakthrough point in the development of explainable models. This paper presents a comprehensive survey of concept-based modeling methods in explainable deep learning. Based on the stage at which interpretability is introduced, existing approaches are categorized into two major paradigms: post-hoc explanations, which extract semantic representations from trained models through techniques such as neuron dissection and semantic clustering; and intrinsic explanations, which incorporate structured priors or semantic constraints during training to endow models with built-in interpretability. Within this classification framework, this survey systematically reviews representative modeling strategies and key methods, compares their performance in terms of semantic transparency and practical applicability, and summarizes current challenges and future research directions. The goal is to provide a structured reference and methodological guidance for understanding and building semantically interpretable deep learning models.

    参考文献
    相似文献
    引证文献
引用本文

王家祺,冯毅,刘华锋,景丽萍,于剑.可解释深度学习的概念建模方法研究综述.软件学报,2026,37(4):0

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-05-12
  • 最后修改日期:2025-08-15
  • 录用日期:
  • 在线发布日期: 2025-09-02
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号