###
DOI:
Journal of Software:2005.16(7):1262-1269

分布式序列模式发现算法的研究
邹翔,张巍,刘洋,蔡庆生
(公安部第三研究所,科研中心,上海,200031;中国科学技术大学,计算机科学系,安徽,合肥,230027)
Study on Distributed Sequential Pattern Discovery Algorithm
ZOU Xiang,ZHANG Wei,LIU Yang,CAI Qing-Sheng
()
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 3573   Download 3430
Received:November 13, 2003    Revised:February 03, 2005
> 中文摘要: 提出算法FDMSP(fast distributed mining of sequential patterns),以解决分布式环境下的序列模式挖掘问题.首先对分布式环境下序列模式的性质进行了分析.算法采用前缀投影技术划分模式搜索空间,利用序列模式前缀指定选举站点统计序列的全局支持计数,利用局部约减、选举约减、计数约减等方法减少候选序列数,同时将算法分为3个子过程异步运行,使得算法具有较低的I/O开销、内存开销和通信开销,从而高效地生成全局序列模式.实验结果显示,在具有海量数据的局域网环境中,FDMSP算法的性能优于将数据集中后采用GSP算法68.5%~99.5%,并且FDMSP算法具有良好的可伸缩性.
中文关键词: 数据挖掘  序列模式  分布式算法
Abstract:Algorithm FDMSP (fast distributed mining of sequential patterns) is proposed in order to deal with mining sequential patterns in distributed environment and its properties are analyzed. The algorithm utilizes prefix-projected technique to divide the pattern searching space, utilizes polling site associated with prefix to get a global support, and utilizes local pruning, poll pruning and count pruning to decrease candidate sequences. It is divided into three sub-procedures which run asynchronously. As a result, the algorithm has lower I/O cost, memory cost and communication cost, and global sequential patterns are generated with higher efficiency. The experiments show that it outperforms the algorithm GSP after centralizing data by 68.5% to 99.5% and scaleable over LAN with huge amount of data.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.70171052, 90104030 (国家自然科学基金) Supported by the National Natural Science Foundation of China under Grant Nos.70171052, 90104030 (国家自然科学基金)
Foundation items:
Reference text:

邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究.软件学报,2005,16(7):1262-1269

ZOU Xiang,ZHANG Wei,LIU Yang,CAI Qing-Sheng.Study on Distributed Sequential Pattern Discovery Algorithm.Journal of Software,2005,16(7):1262-1269