Directed Graph Model of Uyghur Morphological Analysis
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Uyghur is a typical agglutinative language. It has a strong derivational ability with very a rich morphological structure and follows a harmonious rule. In the formation process, some phenomena may occur such as weakened, increased tone and fallen tone. The specific character of Uyghur language determines the difficulty of the Uyghur morphological analysis, including stemming and restoring the changed letter and POS tagging. This paper employs the hierarchical structure of Uyghur word, and proposes a directed graph model for Uyghur morphological analysis. In this model, words and tags are described as a directed graph. In this graph, nodes represent stems, affixes and their corresponding tags, while edges represent the transition, or general probabilities between nodes. Aimed at providing some light on the phenomenon of morphological sandhi in Uyghur language, this paper also proposes a restore model by changing the word to its original form. With the assumption that one letter can be changed to any letter, this model converts restoring problem into a sequence labeling problem, which could be solved by statistical methods. Experiment results on "Mega-words Corpus of Morphological Analysis of Uyghur", which is manually annotated by Xinjiang multilingual key laboratory shows that the accuracy of stemming reaches 94.7%, and the F score of stem and affix in line with tag reaches 92.6%.

    Reference
    Related
    Cited by
Get Citation

麦热哈巴·艾力,姜文斌,王志洋,吐尔根·依布拉音,刘群.维吾尔语词法分析的有向图模型.软件学报,2012,23(12):3115-3129

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 08,2011
  • Revised:February 22,2012
  • Adopted:
  • Online: December 05,2012
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063