###
Journal of Software:2016.27(9):2332-2347

顺序敏感的多源感知数据填补技术
马茜,谷峪,李芳芳,于戈
(东北大学 计算机科学与工程学院, 辽宁 沈阳 110819)
Order-Sensitive Missing Value Imputation Technology for Multi-Source Sensory Data
MA Qian,GU Yu,LI Fang-Fang,YU Ge
(School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 938   Download 1196
Received:September 25, 2015    Revised:January 12, 2016
> 中文摘要: 近年来,随着感知网络的广泛应用,感知数据呈爆炸式增长.但是由于受到硬件设备的固有限制、部署环境的随机性以及数据处理过程中的人为失误等多方面因素的影响,感知数据中通常包含大量的缺失值.而大多数现有的上层应用分析工具无法处理包含缺失值的数据集,因此对缺失数据进行填补是不可或缺的.目前也有很多缺失数据填补算法,但在缺失数据较为密集的情况下,已有算法的填补准确性很难保证,同时未考虑填补顺序对填补精度的影响.基于此,提出了一种面向多源感知数据且顺序敏感的缺失值填补框架OMSMVI(order-sensitive missing value imputation framework for multi-source sensory data).该框架充分利用感知数据特有的多维度相关性:时间相关性、空间相关性、属性相关性,对不同数据源间的相似度进行衡量;进而,基于多维度相似性构建以缺失数据源为中心的相似图,并将已填补的缺失值作为观测值用于后续填补过程中.同时考虑缺失数据源的整体分布,提出对缺失值进行顺序敏感的填补,即:首先对缺失值的填补顺序进行决策,再对缺失值进行填补.对缺失值进行顺序填补能够有效缓解在缺失数据较为密集的情况下,由于缺失数据源的完整近邻与其相似度较低引起的填补精度下降问题;最后,对KNN填补算法进行改进,提出一种新的基于近邻节点的缺失值填补算法NI(neighborhood-based imputation),该算法利用感知数据的多维度相似性对缺失数据源的所有近邻节点进行查找,解决了KNN填补算法K值难以确定的问题,也进一步提高了填补准确性.利用两个真实数据集,并与基本填补算法进行对比,验证了算法的准确性及有效性.
Abstract:In recent years, it is recognized that sensing data is growing explosively with widespread use of sensing network. Due to the inherent hardware limitation, the randomness of distribution environment and unconscious errors during data processing, a deluge of missing values are mingled in original sensing data. Thus, imputing the missing values is essential because most of the existed analysis tools are not competent to the data sets containing missing values. So far, there have been many missing data imputation algorithms, however the accuracy of these algorithms is difficult to be guaranteed in the scenario of lumped missing data. Besides, these existing algorithms don't take the imputation order which influences the imputation accuracy into consideration. To address the above issues, this paper proposes an order-sensitive missing value imputation framework called OMSMVI for multi-source sensory data. OMSMVI takes advantages of multi-dimensions relevancy, such as temporal relevancy, spatial relevancy and attributive relevancy of sensing data adequately. The missing-sources-centered similarity graphs are constructed based on multi-dimensions relevancy. At the same time, in the process of missing data imputation, the imputed missing values are used as observations to impute subsequent missing values. Taking the whole distribution of missing sources into consideration, the framework performs order-sensitive missing value imputation, meaning that the order of imputation is ascertained before applying the specific MVI (missing value imputation) methods. Order-sensitive imputation can remit the decrease of imputed result accuracy caused by the lower similarity between missing source and its neighbors when the missing sources are dense. Finally, a new neighborhood-based missing values imputation algorithm NI, which modifies the KNN imputation algorithm, is introduced into the OMSMVI framework. NI uses the multi-dimension similarity to search the missing sources' neighbors which reflect the similarity from multiple dimensions. Such NI algorithm overcomes the shortcoming that parameter K of KNN is difficult to determine. Furthermore, NI algorithm can improve the imputation accuracy further compared to KNN. Two true sensor data sets are used to compare with the baseline MVI methods to verify the accuracy and effectiveness of OMSMVI.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61472071,61272179);国家重点基础研究发展计划(973)(2012CB316201);中央高校基本科研业务费(N140404013) 国家自然科学基金(61472071,61272179);国家重点基础研究发展计划(973)(2012CB316201);中央高校基本科研业务费(N140404013)
Foundation items:National Natural Science Foundation of China (61472071, 61272179); National Key Basic Research Program of China (973) (2012CB316201); Fundamental Research Funds for Central Universities (N140404013)
Reference text:

马茜,谷峪,李芳芳,于戈.顺序敏感的多源感知数据填补技术.软件学报,2016,27(9):2332-2347

MA Qian,GU Yu,LI Fang-Fang,YU Ge.Order-Sensitive Missing Value Imputation Technology for Multi-Source Sensory Data.Journal of Software,2016,27(9):2332-2347