Open Domain New Word Detection Using Condition Random Field Method
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Open domain new word detection is vital for Chinese natural language processing research. This paper proposes a novel detection algorithm based condition random field (CRF), which treats the new word detection problem as a classification problem. In this algorithm, the study tries to separate boundaries of new words from existing words with both the CRF method and a serial of statistical features extracted from large scale corpus. The effectiveness of three different discretization strategies are also compared including K-means, equal-frequency, and information gain. Experimental results on a large-scale Web corpus named SogouT show the effectiveness of the proposed algorithms.

    Reference
    Related
    Cited by
Get Citation

陈飞,刘奕群,魏超,张云亮,张敏,马少平.基于条件随机场方法的开放领域新词发现.软件学报,2013,24(5):1051-1060

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 20,2011
  • Revised:April 23,2012
  • Adopted:
  • Online: May 07,2013
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063