Extracting Subject from Internet News by String Match
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Subject extraction from a text is very important for natural languag e processing. Traditional methods mainly depend on the mode of "thesaurus plus m atch". It is not fit to process Internet news because of its limited volume and slow update speed. After analyzing the news structure carefully, this paper pres ents a new practical method to extract news subjects without thesaurus, and give the main implementing procedure. Instead of large thesaurus, it uses the specia l structure of Internet news to find the repeated strings. These repeated string s could express the news subjects very well. Experimental results show that this method can extract the most important subject strings from most of Internet new s rapidly and efficiently. Moreover, this method is equally efficient to other A sian languages such as Japanese and Korean, as well as other western languages.

    Reference
    Related
    Cited by
Get Citation

尹中航,王永成,蔡巍,韩客松.利用串匹配技术实现网上新闻的主题提取.软件学报,2002,13(2):159-167

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 21,2000
  • Revised:July 12,2001
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063