###
DOI:
Journal of Software:2004.15(2):179-184

两种对URL的散列效果很好的函数
李晓明,凤旺森
(北京大学,计算机科学技术系,北京,100871)
Two Effective Functions on Hashing URL
LI Xiao-Ming,FENG Wang-Sen
()
Abstract
Chart / table
Reference
Similar Articles
Article :Browse 3576   Download 3240
Received:March 05, 2003    Revised:June 18, 2003
> 中文摘要: 在Web信息处理的研究中,不少情况下需要对很大的URL序列进行散列操作.针对两种典型的应用场合,即Web结构分析中的信息查询和并行搜索引擎中的负载平衡,基于一个含有2000多万个URL的序列,进行了大规模的实验评测.说明在许多文献中推荐的对字符串散列效果很好的ELFhash函数对URL的散列效果并不好,同时推荐了两种对URL散列效果很好的函数.
中文关键词: 散列  ELFhash  URL  均匀分布  Web挖掘  负载平衡
Abstract:Hashing large collection of URLs is an inevitable problem in many Web research activities. Through a large scale experiment, three hash functions are compared in this paper. Two metrics were developed for the comparison, which are related to web structure analysis and Web crawling, respectively. The finding is that the well-known function for hashing sequence of symbols, ELFhash, is not very good in this regard, and the other two functions are better and thus recommended.
文章编号:     中图分类号:    文献标志码:
基金项目:Supported by the National Grand Fundamental Research 973 Program of China under Grant No.G1999032706(国家重点基础研究发展规划(973)) Supported by the National Grand Fundamental Research 973 Program of China under Grant No.G1999032706(国家重点基础研究发展规划(973))
Foundation items:
Reference text:

李晓明,凤旺森.两种对URL的散列效果很好的函数.软件学报,2004,15(2):179-184

LI Xiao-Ming,FENG Wang-Sen.Two Effective Functions on Hashing URL.Journal of Software,2004,15(2):179-184