以在智能移动设备上发表的用户评论作为研究对象,并将该类评论称为轻型评论.指出了轻型评论与早期互联网评论及短文本研究的异同点,并通过实验总结轻型评论的独有特性:字数少、跨度大,短小评论数量众多,评论长度与数量满足幂率分布.同时,针对轻型评论的情感分类研究展开了一系列的实验研究,发现:(1) 情感分类效果随着评论长度的增加而下降;(2) 传统的特征筛选方法以及特征加权方法对于轻型评论效果都不够理想;(3) 极性词在短评论中比例高于长评论;(4) 长、短评论在用词上存在较高的重叠度.在此基础上,提出了一种基于短评论特征共现的特征筛选方法,将短小评论中的优势信息和传统的特征筛选方法相结合,在筛选掉无用噪音的同时增补有利于分类的有效特征.实验结果表明,该方法可以有效地提高轻型评论中较长评论的分类效果.
This paper researches the newly emerging user reviews (referred here as "light reviews") generated from smart mobile devices. The similarities and differences between this research and the early studies are pointed out. The unique characteristics of the light review can be summarized as having shorter texts, bigger span, and in most cases fewer words per review. The review length and scale also meet the power-law distribution. A series of experiments are studies based on light reviews, resulting in some interesting findings: (1) There is an inverse relationship between classification accuracy and review length; (2) The traditional classical feature selection and feature weight method do not perform well enough on light reviews; (3) The polar word ratio in short reviews, which is the most important feature in sentiment analysis, is higher than in long reviews; (4) There is a higher shared feature term proportion between short review and long review. Based on above studies, the paper puts forward a feature selection method based on short text co-occurrence feature. By combining the information advantages in short reviews with the traditional feature selection methods, the presented method preserves useful information and details as much as possible while removing noise. The results of experiment show that the method is effective and the classification rate is higher.