针对视觉异常检测任务, 提出一种基于特征约束的蒸馏学习方法, 充分利用教师网络模型的特征来指导学生模型高效的识别异常图像. 具体地, 引入Vision Transformer (ViT)作为异常检测任务的主干网络, 并提出中心特征策略约束学生网络的输出特征. 由于教师网络的特征表达能力较强, 特征中心策略从教师网络中动态地为学生网络生成正常样本的特征表示中心, 从而提升学生网络对正常数据特征输出的描述能力, 进而扩大了学生网络和教师网络对于异常数据的特征差异; 另一方面, 为了最小化学生网络和教师网络在正常图像特征表示上的差异, 引入格拉姆损失函数对学生网络编码层之间的关系进行约束. 在3个异常检测通用数据集和1个真实工业异常检测数据集上进行了实验验证, 相比当前最优方法, 所提方法取得了显著的性能提升.
This study proposes a new feature constrained distillation learning method for visual anomaly detection, which makes full use of the features of the teacher model to instruct the student model to efficiently identify abnormal images. Specifically, the Vision Transformer (ViT) model is introduced as the backbone network of anomaly detection tasks, and a central feature strategy is put forward to constrain the output features of the student network. Considering the strong feature expressiveness of the teacher network, the central feature strategy is developed to dynamically generate the feature representation centers of normal samples for the student network from the teacher network. In this way, the ability of the student network to describe the feature output of normal data is improved, and the feature difference between the student and teacher networks in abnormal data is widened. In addition, to minimize the difference between the student and teacher networks in the feature representation of normal images, the proposed method leverages the Gram loss function to constrain the relationship between the coding layers of the student network. Experiments are conducted on three general anomaly detection data sets and one real-world industrial anomaly detection data set, and the experimental results demonstrate that the proposed method significantly improves the performance of visual anomaly detection compared with the state-of-the-art methods.