Journal of Software:2009.20(7):1746-1755

(哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001)
Context-Dependent Lexical Paraphrasing Based on Web Mining
Received:May 09, 2007    Revised:March 14, 2008
> 中文摘要: 词汇级复述研究旨在为词汇获取复述.词汇级复述是上下文相关的,即对同一个词在不同上下文中应获取不同的复述词.提出了一种获取上下文相关词汇级复述的方法.该方法包括两部分:基于网络挖掘的候选复述词获取以及基于二元分类的复述词确认.在《人民日报》语料库上的实验结果表明:(1) 基于网络挖掘的候选复述词获取方法是切实可行的,平均为每个待复述词在每个给定的上下文句子中获取2.3 个正确复述词;(2) 利用二元分类的方法进行复述确认是有效的,其F 值达到0.6023;(3) 利用该方法抽取得到的复述中,有75.11%和98.31%无法通过两种常用的上下文无关方法,即基于辞典和基于聚类的方法来获得.这证明了所提出的上下文相关复述方法可以有效地补充传统的上下文无关方法.
Abstract:Lexical paraphrasing is the task of extracting word-level paraphrases. Lexical paraphrases should be context dependent since a word may have different paraphrases in distinct contexts. This paper investigates a framework for acquiring context-dependent lexical paraphrases, in which a web mining method is developed for extracting candidate paraphrases and a classification method is introduced in paraphrase validation. Evaluations are carried out on the People’s Daily corpus and the results show that: (1) the web mining method performs well in candidate paraphrase extraction, which extracts 2.3 correct paraphrases on average for each test word in each given context sentence; (2) the classifier for paraphrase validation is effective, which achieves an f-measure of 0.6023; (3) 75.11% and 98.31% of the paraphrases extracted by our method cannot be recognized by the two widely used context-independent methods, i.e., the thesaurus-based and clustering-based methods respectively. This indicates that the presented context-dependent method is a considerable supplement to the context-independent ones.
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60503072, 60575042 (国家自然科学基金) Supported by the National Natural Science Foundation of China under Grant Nos.60503072, 60575042 (国家自然科学基金)
赵世奇,张 宇,赵 琳,刘 挺,李 生.基于网络挖掘的上下文相关词汇级复述研究.软件学报,2009,20(7):1746-1755

.Context-Dependent Lexical Paraphrasing Based on Web Mining.Journal of Software,2009,20(7):1746-1755