Survey on Multimodal Visual Language Representation Learning

doi:10.13328/j.cnki.jos.006125

微信服务号

微信订阅号

Home > Archive>Volume 32, Issue 2, 2021 >327-348. DOI:10.13328/j.cnki.jos.006125

PDF HTML XML Export Cite reminder

Survey on Multimodal Visual Language Representation Learning
DOI:
                        10.13328/j.cnki.jos.006125
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (U1836215)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

A multimedia world in which human beings live is built from a large number of different modal contents. The information between different modalities is highly correlated and complementary. The main purpose of multi-modal representation learning is to mine the different modalities. Commonness and characteristics produce implicit vectors that can represent multimodal information. This article mainly introduces the corresponding research work of the currently widely used visual language representation, including traditional research methods based on similarity models and current mainstream pre-training methods based on language models. The current better ideas and solutions are to semanticize visual features and then generate representations with textual features through a powerful feature extractor. Transformer is currently used in various tasks of representation learning as the mainstream network architecture. This article elaborates from several different angles of research background, division of different studies, evaluation methods, future development trends, etc.

Reference

Cited by

Get Citation

杜鹏飞,李小勇,高雅丽.多模态视觉语言表征学习研究综述.软件学报,2021,32(2):327-348

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 11,2020
Revised:June 26,2020
Adopted:
Online: September 10,2020
Published: February 06,2021

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History