SVM+BiHMM: A Hybrid Statistic Model for Metadata Extraction
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    This paper proposes SVM+BiHMM, a hybrid statistic model of metadata extraction based on SVM (support vector machine) and BiHMM (bigram HMM (hidden Markov model)). The BiHMM model modifies the HMM model with both Bigram sequential relation and position information of words, by means of distinguishing the beginning emitting probability from the inner emitting probability. First, the rule based extractor segments documents into line-blocks. Second, the SVM classifier tags the blocks into metadata elements. Finally, the SVM+BiHMM model is built based on the BiHMM model, with the emitting probability adjusted by the Sigmoid function of SVM score, and the transition probability trained by Bigram HMM. The SVM classifier benefits from the structure patterns of document line data while the Bigram HMM considers both words' Bigram sequential relation and position information, so the complementary SVM+BiHMM outperforms HMM, BiHMM, and SVM methods in the experiments on the same task.

    Reference
    Related
    Cited by
Get Citation

张 铭,银 平,邓志鸿,杨冬青. SVM+BiHMM:基于统计方法的元数据抽取混合模型.软件学报,2008,19(2):358-368

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 28,2006
  • Revised:June 07,2007
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063