###
Journal of Software:2020.31(12):3808-3822

自适应主动半监督学习方法
李延超,肖甫,陈志,李博
(南京邮电大学 计算机学院 软件学院 网络空间安全学院, 江苏 南京 210023;南京理工大学 计算机科学与工程学院, 江苏 南京 210094)
Adaptive Active Learning for Semi-supervised Learning
LI Yan-Chao,XIAO Fu,CHEN Zhi,LI Bo
(School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 80   Download 144
Received:July 07, 2019    Revised:July 28, 2019
> 中文摘要: 主动学习从大量无标记样本中挑选样本交给专家标记.现有的批抽样主动学习算法主要受3个限制:(1)一些主动学习方法基于单选择准则或对数据、模型设定假设,这类方法很难找到既有不确定性又有代表性的未标记样本;(2)现有批抽样主动学习方法的性能很大程度上依赖于样本之间相似性度量的准确性,例如预定义函数或差异性衡量;(3)噪声标签问题一直影响批抽样主动学习算法的性能.提出一种基于深度学习批抽样的主动学习方法.通过深度神经网络生成标记和未标记样本的学习表示和采用标签循环模式,使得标记样本与未标记样本建立联系,再回到相同标签的标记样本.这样同时考虑了样本的不确定性和代表性,并且算法对噪声标签具有鲁棒性.在提出的批抽样主动学习方法中,算法使用的子模块函数确保选择的样本集合具有多样性.此外,自适应参数的优化,使得主动学习算法可以自动平衡样本的不确定性和代表性.将提出的主动学习方法应用到半监督分类和半监督聚类中,实验结果表明,所提出的主动学习方法的性能优于现有的一些先进的方法.
中文关键词: 主动学习  半监督学习  分类  聚类
Abstract:Active learning algorithms attempt to overcome the labeling bottleneck by asking queries from a large collection of unlabeled examples. Existing batch mode active learning algorithms suffer from three limitations: (1) the models with assumption on data are hard in finding images that are both informative and representative; (2) the methods that are based on similarity function or optimizing certain diversity measurement may lead to suboptimal performance and produce the selected set with redundant examples; (3) the problem of noise labels has been an obstacle for active learning algorithms. This study proposes a novel batch mode active learning method based on deep learning. The deep neural network generates the representations (embeddings) of labeled and unlabeled examples, and label cycle mode is adopted by connecting the embeddings from labeled examples to those of unlabeled examples and back at the same class, which considers both informativeness and representativeness of examples, as well as being robust to noisy labels. The proposed active learning method is applied to semi-supervised classification and clustering. The submodular function is designed to reduce the redundancy of the selected examples. Moreover, the query criteria of weighting losses are optimized in active learning, which automatically trade off the balance of informative and representative examples. Specifically, batch mode active scheme is incorporated into the classification approaches, in which the generalization ability is improved. For semi-supervised clustering, the proposed active scheme for constraints is used to facilitate fast convergence and perform better than unsupervised clustering. To validate the effectiveness of the proposed algorithms, extensive experiments are conducted on diversity benchmark datasets for different tasks, and the experimental results demonstrate consistent and substantial improvements over the state-of-the-art approaches.
文章编号:     中图分类号:TP181    文献标志码:
基金项目:国家自然科学基金(61932013);江苏省自然科学基金(BK20200739);江苏省333高层次人才培养工程(BRA2020065) 国家自然科学基金(61932013);江苏省自然科学基金(BK20200739);江苏省333高层次人才培养工程(BRA2020065)
Foundation items:National Natural Science Foundation of China (61932013); Natural Science Foundation of Jiangsu Province of China (BK20200739); Research Foundation of Jiangsu for 333 High Level Talents Training Project (BRA2020065)
Reference text:

李延超,肖甫,陈志,李博.自适应主动半监督学习方法.软件学报,2020,31(12):3808-3822

LI Yan-Chao,XIAO Fu,CHEN Zhi,LI Bo.Adaptive Active Learning for Semi-supervised Learning.Journal of Software,2020,31(12):3808-3822