基于权重查询词的XML结构查询扩展
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60763001 (国家自然科学基金); the National Social Science Foundation of China under Grant No.07BTQ025 (国家社会科学基金); the Key Science-Technology Project of the Education Department of Jiangxi Provincial of China under Grant No.[2006]320 (江西省教育厅重点科技项目)


Structural Query Expansion Based on Weighted Query Term for XML Documents
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    文本文档信息检索中检索质量不高的一个主要原因是用户难以提出准确的描述查询意图的查询表达式. 而XML文档除了具有文本文档的内容特征外,还具有结构特征,导致用户更难以提出准确的查询表达式.为了解决这一问题,提出一种基于相关反馈的查询扩展方法,可以帮助用户构建满足查询意图的"内容+结构"的查询表达式.该方法首先进行查询词扩展,找到最能代表用户查询意图的权重扩展查询词;然后在扩展查询词的基础上进行结构查询扩展;最终形成完整的"内容+结构"的查询扩展表达式.实验结果表明,与未进行查询扩展相比,扩展后prec@10和prec@20的平均准确率提高30%以上.

    Abstract:

    The main reason of low precision in information retrieval (IR) is that it is difficult for the users to submit a precise query expression for their query intensions. Furthermore, XML documents have characteristics not only in the content, but also in its structure. Therefore it is more difficult for users to submit precise query expressions. In order to solve this problem, this paper puts forward a new query expansion method based on relevance feedback. It can help users to construct a content and structure query expression which can satisfy users' intentions. This method includes two steps. The first step is to expand keywords for finding the weighted keyword which can represent the user's intentions. The second step is structural expansion based on the weighted keywords. Finally a full-edged content-structure query is formalized. Experimental results show that the method can obtain better retrieval results. The average precision of prec@10 and prec@20 is 30% higher than the original query.

    参考文献
    相似文献
    引证文献
引用本文

万常选,鲁 远.基于权重查询词的XML结构查询扩展.软件学报,2008,19(10):2611-2619

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2007-09-10
  • 最后修改日期:2008-05-29
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号