Online Comment Clustering Based on an Improved Semantic Distance
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    An improved semantic distance for short text is proposed. The new method calculates the semantic distance between two word strings as balance of the extent of word sequence alignment and the meaning matching between word strings. First, after linguistic preprocessing, the extent of word sequence alignment is computed by the structural distance which measures the maximum matching based on the HIT-CIR Tongyici Cilin (extended edition). Then the meaning matching between word strings is computed by an improved edit distance which allocates each word a weight according to its word type. Finally, the semantic distance between the word strings is measured as a balance of structural distance and word meaning matching distance. In addition, in order to eliminate the influence of the sentence length, the proposed semantic distance is adjusted using the distinct word count estimated by the Heap's law and Zipf law. Experimental results show that the presented methods are more efficient than the classical edit distance models.

    Reference
    Related
    Cited by
Get Citation

杨震,王来涛,赖英旭.基于改进语义距离的网络评论聚类研究.软件学报,2014,25(12):2777-2789

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 05,2014
  • Revised:August 21,2014
  • Adopted:
  • Online: December 04,2014
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063