Journal of Software:2018.29(S2):1-15

(北京大学 计算机科学技术研究所, 北京 100871)
Modality Compensation Based Action Recognition
SONG Si-Jie,LIU Jia-Ying,LI Yang-Hao,GUO Zong-Ming
(Institute of Computer Science and Technology, Peking University, Beijing 100871, China)
Chart / table
Similar Articles
Article :Browse 536   Download 486
Received:April 13, 2018    Revised:June 13, 2018
> 中文摘要: 随着深度摄像机的发展,不同模态的视频数据更易获得.基于多模态数据的视频动作识别也受到越来越广泛的关注.不同模态的数据能够从多个角度对视频动作进行描述,如何有效地利用多模态数据并形成优势互补是视频动作识别中的重要方向.提出了一种基于关联模态补偿的视频动作识别算法.该方法以RGB和光流场视频数据为源模态,以3D骨架数据为辅助模态,利用源模态和辅助模态高层特征空间的关联性,补偿源模态的特征提取.该算法基于卷积神经网络和长短期记忆网络,对源模态数据和辅助模态数据进行时空域特征建模.在此基础上,提出了基于残差子网络的模态适应模块,通过统一源模态特征和辅助模态特征的数据分布,实现辅助模态对源模态的特征补偿.考虑到源模态数据和辅助模态数据在动作类别或动作样本等方面存在不同程度的对齐情况,设计了多层次模态适应算法,以适应不同的训练数据.所提算法仅在训练过程中需要辅助模态的帮助,在测试过程中可以仅根据源模态数据进行动作的识别,极大地拓展了该算法的实用性.在通用公共数据集上的实验结果表明,相比于现有动作识别算法,该算法取得了更加优越的性能.
Abstract:With the prevalence of depth cameras, video data of different modalities become more common. Multi-Modal data based human action recognition attracts increasing attention. Different modal data describe human actions from distinct perspectives. How to effectively utilize the complementary information of multi-modal data is a key topic in this area. In this study, we propose a modality compensation based method for action recognition. With RGB/optical flow as source modal data and skeletons as auxiliary modal data, we aim to compensate the feature learning from source modal data, through exploring the common spaces between source and auxiliary modalities. The proposed model is based on deep convolutional neural network (CNN) and long short term memory (LSTM) network to extract spatial and temporal features. With the help of residual learning, a modality adaptation block is proposed to align the distributions of different modalities and achieve modality compensation. To deal with different alignment of source and auxiliary modal data, we propose hierarchical modality adaptation schemes. The proposed model only requires auxiliary modal data in the training process, and is able to improve the recognition performance only with source modal data in the testing phase, which expands the application scenarios of the proposed model. The experiment results illustrate that proposed method outperforms other state-of-the-art approaches.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61772043) 国家自然科学基金(61772043)
Foundation items:National Natural Science Foundation of China (61772043)
Reference text:


SONG Si-Jie,LIU Jia-Ying,LI Yang-Hao,GUO Zong-Ming.Modality Compensation Based Action Recognition.Journal of Software,2018,29(S2):1-15