###
Journal of Software:2013.24(8):1804-1815

带通配符和One-Off条件的序列模式挖掘
吴信东,谢飞,黄咏明,胡学钢,高隽
(合肥工业大学 计算机与信息学院, 安徽 合肥 230009;Department of Computer Science, University of Vermont, Burlington, VT 05405, USA;合肥师范学院 计算机科学与技术系, 安徽 合肥 230601)
Mining Sequential Patterns with Wildcards and the One-Off Condition
WU Xin-Dong,XIE Fei,HUANG Yong-Ming,HU Xue-Gang,GAO Jun
(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;Department of Computer Science, University of Vermont, Burlington, VT 05405, USA;Department of Computer Science and Technology, Hefei Normal University, Hefei 230601, China)
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 4870   Download 3574
Received:August 05, 2011    Revised:September 12, 2012
> 中文摘要: 很多应用领域产生大量的序列数据.如何从这些序列数据中挖掘具有重要价值的模式,已成为序列模式挖掘研究的主要任务.研究这样一个问题:给定序列S、支持度阈值和间隔约束,从序列S中挖掘所有出现次数不小于给定支持度阈值的频繁序列模式,并且要求模式中任意两个相邻元素在序列中的出现位置满足用户定义的间隔约束.设计了一种有效的带有通配符的模式挖掘算法One-Off Mining,模式在序列中的出现满足One-Off 条件,即模式的任意两次出现都不共享序列中同一位置的字符.在生物DNA 序列上的实验结果表明,One-Off Mining 比相关的序列模式挖掘算法具有更好的时间性能和完备性.
Abstract:There is a huge wealth of sequence data available in real-world applications. The task of sequential pattern mining serves to mine important patterns from the sequence data. Given a sequence S, a certain threshold, and gap constraints, this paper aims to discover frequent patterns whose supports in S are no less than the given threshold value. There are flexible wildcards in pattern P, and the number of the wildcards between any two successive elements of P fulfills the user-specified gap constraints. The study designs an efficient mining algorithm: One-Off Mining, whose mining process satisfies the One-Off condition under which each character in the given sequence can be used at most once in all occurrences of a pattern. Experiments on DNA sequences show that this method performs better in time and completeness than the related sequential pattern mining algorithms.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61229301, 60828005, 61273292); 美国国家科学基金(CCF-0905337, CCF-0514819); 国家高技术研究发展计划(863)(2012AA011005); 国家重点基础研究发展计划(973)(2013CB329604) 国家自然科学基金(61229301, 60828005, 61273292); 美国国家科学基金(CCF-0905337, CCF-0514819); 国家高技术研究发展计划(863)(2012AA011005); 国家重点基础研究发展计划(973)(2013CB329604)
Foundation items:
Reference text:

吴信东,谢飞,黄咏明,胡学钢,高隽.带通配符和One-Off条件的序列模式挖掘.软件学报,2013,24(8):1804-1815

WU Xin-Dong,XIE Fei,HUANG Yong-Ming,HU Xue-Gang,GAO Jun.Mining Sequential Patterns with Wildcards and the One-Off Condition.Journal of Software,2013,24(8):1804-1815