Confusion Class Discrimination Techniques for Text Classification
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    This paper analyzes confusion class phenomena existing in text classification procedure, and studies further confusion class discrimination techniques to improve the performance of text classification. In this paper, firstly a technique for confusion class recognition based on classification error distribution is proposed to recognize confusion class sets existing in the pre-defined taxonomy. To effectively discriminate confusion classes, this paper proposes an approach to feature selection based on discrimination capability in the procedure of which each candidate feature's discrimination capability for class pair is evaluated. At last, two-stage classifiers are used to integrate baseline classifier and confusion class classifiers, and in which the two output results from two stages are combined into the final output results. The confusion class classifiers in the second stage could be activated only when the output class of the input text assigned by baseline classifier in the first stage belongs to confusion classes, then the confusion class classifiers are used to discriminate the testing text again. In the comparison experiments, Newsgroup and 863 Chinese evaluation data collection are used to evaluate the effectiveness of the techniques proposed in this paper, respectively. Experimental results show that the methods could improve significantly the performance for single-label and multi-class classifier (SMC).

    Reference
    Related
    Cited by
Get Citation

朱靖波,王会珍,张希娟.面向文本分类的混淆类判别技术.软件学报,2008,19(3):630-639

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 02,2006
  • Revised:October 10,2006
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063