(哈尔滨工业大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001)
Uncertain Rule Based Method for Evaluating Data Currency
LI Mo-Han,LI Jian-Zhong,CHENG Si-Yao
(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
Received:May 07, 2014    Revised:August 19, 2014
> 中文摘要: 数据过时是影响数据质量的重要因素,因此判定数据时效性对于提高数据质量至关重要.当前判定数据时效性的方法可分为两类:基于时间戳的方法和基于规则的方法.基于时间戳的方法要求精确完整的时间戳,但这样的时间戳在很多应用中不存在.基于规则的方法不要求时间戳,但现有方法均依赖于冗余元组,且不能对数据时效性做出定量判定.同时,这些方法均基于确定规则,无法表达不确定的领域知识.针对上述问题,提出不确定时效规则及相应的数据时效性模型.基于该模型,进一步给出了两个可定量地判定数据时效性的算法.同时,还给出了时效规则的学习算法.真实数据上的实验结果验证了算法的有效性.
Abstract:Data staleness is one of the most important factors leading to low data quality. It highlights the needs of determining the currency of data to identify whether a database is up-to-date. There are some works on determining data currency, but all these methods have their limitations. Some works require timestamps which are always invalid, and others are based on certain currency rules which can only decide relevant currency and cannot express uncertain semantics. To overcome the limitations of existing methods, this paper introduces a new approach for determining data currency based on uncertain rules. A new class of uncertain currency rule is first introduced. Based on the uncertain rules, mathematical models of data currency are proposed. Two algorithms to determine data currency are developed. A method of automatically learning the uncertain currency rules is also provided. Using real-life data, the effectiveness and efficiency of our methods are experimentally verified.
基金项目:国家重点基础研究发展计划(973)(2012CB316202);国家自然科学基金(61133002) 国家重点基础研究发展计划(973)(2012CB316202);国家自然科学基金(61133002)
