Journal of Software:2012.23(6):1588-1601

(浙江大学 计算机科学与技术系,浙江 杭州 310027)
Efficient Processing of Similarity Search on Uncertain Set-Valued Data
(Department of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China)
Chart / table
Similar Articles
Article :Browse 2828   Download 2792
Received:February 24, 2011    Revised:May 18, 2011
> 中文摘要: 基于可能世界的不确定集合的相似查询,从语义上或者从计算方法的角度来看,都有别于传统的确定型集合上的技术.由于集合中的项存在不确定性,即一个项出现在集合中是有一定概率的,使得传统处理集合的技术不再适用.提出了一个基于可能世界的集合期望相似度的度量公式.在期望的度量公式中,如果一对集合(X,Y)的期望相似度大于给定的阈值τ∈(0,1),则被称为相似集合对.一般的算法,在基于可能世界的情况下计算不确定集合的期望相似度,其复杂度是指数级的.提出了利用动态规划来计算集合期望相似度的算法,该算法的复杂度是多项式级别,极大地减少了计算时间.实验结果表明了基于该算法查询的可用性和高性能.
Abstract:Setting similarity search on possible worlds is semantically and computationally different from the traditional technique for sets of certain data. Considering the uncertainty of items of set, i.e. there is a certain probability for an item appearing in a set, the traditional technique used for processing sets is not applicable. This paper brings forward the formulas to measure the expected similarity of the sets based on possible worlds' semantics. In the expected contexts, if the expected similarity of a pair of sets (X,Y) is larger than a given threshold value τ ∈(0,1), this pair could be called as similar set pair. In the normal algorithm, the complexity of the expected similarity of uncertain sets based on possible worlds is of exponential order. This paper has provided new algorithms to calculate expected similarity by dynamic programming. The complexity of these algorithms is of polynomial order and they reduce execution time greatly. The final experiments have indicated the usability and the high performance of the new algorithms.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(60803003, 60970124) 国家自然科学基金(60803003, 60970124)
Foundation items:
Reference text:


CHEN Ke,HONG Yin-Jie,CHEN Gang.Efficient Processing of Similarity Search on Uncertain Set-Valued Data.Journal of Software,2012,23(6):1588-1601