Approach to Chinese Word Segmentation Based on Character-Word Joint Decoding
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The performance of Chinese word segmentation has been greatly improved by character-based approaches in recent years. With the help of powerful machine learning strategies, the words extraction via combination of characters becomes the focus in Chinese word segmentation researches. In spite of the outstanding capability of discovering out-of-vocabulary words, the character-based approaches are not as good as word-based approaches in in-vocabulary words segmentation with some internal and external information of the words lost. In this paper we propose a joint decoding strategy that combines the character-based conditional random field model and word-based Bi-gram language model, for segmenting Chinese character sequences. The experimental results demonstrate the good performance of our approach, and prove that two sub models are well integrated as the joint model of character and word could more effectively enhance the performance of Chinese word segmentation systems than any of the single model, thus is fit for many applications in Chinese information processing.

    Reference
    Related
    Cited by
Get Citation

宋彦,蔡东风,张桂平,赵海.一种基于字词联合解码的中文分词方法.软件学报,2009,20(9):2366-2375

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 12,2008
  • Revised:March 05,2009
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063