Journal of Software:2021.32(4):1067-1081

(南京理工大学 计算机科学与工程学院, 江苏 南京 210094;计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023;Rutgers Business School, Newark, NJ 07012, USA)
Reliable Multi-modal Learning: A Survey
(School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China;Rutgers Business School, Newark, NJ 07012, USA)
Chart / table
Similar Articles
Article :Browse 738   Download 546
Received:June 17, 2019    Revised:April 28, 2020
> 中文摘要: 近年来,多模态学习逐步成为机器学习、数据挖掘领域的研究热点之一,并成功地应用于诸多现实场景中,如跨媒介搜索、多语言处理、辅助信息点击率预估等.传统多模态学习方法通常利用模态间的一致性或互补性设计相应的损失函数或正则化项进行联合训练,进而提升单模态及集成的性能.而在开放环境下,受数据缺失及噪声等因素的影响,多模态数据呈现不均衡性.具体表现为单模态信息不充分或缺失,从而导致“模态表示强弱不一致”“模态对齐关联不一致”两大挑战,而针对不均衡多模态数据直接利用传统的多模态方法甚至会退化单模态和集成的性能.针对这类问题,可靠多模态学习被提出并进行了广泛研究,系统地总结和分析了目前国内外学者针对可靠多模态学习取得的进展,并对未来研究可能面临的挑战进行展望.
Abstract:Recently, multi-modal learning is one of the important research fields of machine learning and data mining, and it has a wide range of practical applications, such as cross-media search, multi-language processing, auxiliary information click-through rate estimation, etc. Traditional multi-modal learning methods usually use the consistency or complementarity among modalities to design corresponding loss functions or regularization terms for joint training, thereby improving the single-modal and ensemble performance. However, in the open environment, affected by factors such as data missing and noise, multi-modal data is imbalanced, specifically manifested as insufficient or incomplete, resulting in “inconsistency modal feature representations” and “inconsistent modal alignment relationships”. Direct use of traditional multi-modal methods will even degrade single-modal and ensemble performance. To solve these problems, reliable multi-modal learning has been proposed and studied. This paper systematically summarizes and analyzes the progress made by domestic and international scholars on reliable multi-modal research, and the challenges that future research may face.
文章编号:     中图分类号:TP391    文献标志码:
基金项目:国家自然科学基金(61673201,62006118,61773198,61632004);江苏省自然科学基金(BK20200460);CCF-百度松果基金(CCF-BAIDU OF2020011);百度TIC项目基金 国家自然科学基金(61673201,62006118,61773198,61632004);江苏省自然科学基金(BK20200460);CCF-百度松果基金(CCF-BAIDU OF2020011);百度TIC项目基金
Foundation items:National Natural Science Foundation of China (61673201, 62006118, 61773198, 61632004); Natural Science Foundation of Jiangsu Province, China (BK20200460); CCF-BAIDU Songguo Foundation (CCF-BAIDU OF2020011); BAIDU TIC Foundation
Reference text:


YANG Yang,ZHAN De-Chuan,JIANG Yuan,XIONG Hui.Reliable Multi-modal Learning: A Survey.Journal of Software,2021,32(4):1067-1081