Efficient World-Wide-Web Information Gathering

微信服务号

微信订阅号

Home > Archive>Volume 12, Issue 1, 2001 >33-40

Efficient World-Wide-Web Information Gathering
DOI:
                        
Author:
                        
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the information available through World-Wide-Web becoming overwhelming, e fficient information gathering (IG) tools are necessary. Since the network resou rces are expensive, so IG is a resource-bounded task. The main purpose of this paper is to find an efficient gathering method for specific topic. This paper pr esents methods for predicting page's content without downloading it, designs dif ferent controlling strategies, and defines several kinds of page downloading pri ority measures. An IG system, TH-Gatherer, was built to test the methods, and d ifferent experiments were carried out. Through experiments, it was found that th e content of candidate pages can be predicted approximately without downloading. When the priority based gathering strategy and hybrid measure are used, the gat hering efficiency is four times of that of BFS strategy which is used by many cu rrent IG tools (including crawlers and off-line browsing tools). The method pre sented in this paper is suitable for resource-bounded, specific topic informati on gathering.

Reference

Cited by

Get Citation

田范江,王曦东,王鼎兴.高效率WWW信息采集.软件学报,2001,12(1):33-40

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online:
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History