首页> 外文会议>European Conference on Information Retrieval Research >A Discourse Search Engine Based on Rhetorical Structure Theory
【24h】

A Discourse Search Engine Based on Rhetorical Structure Theory

机译:一种基于修辞结构理论的话语搜索引擎

获取原文

摘要

Representing a document as a bag-of-words and using keywords to retrieve relevant documents have seen a great success in large scale information retrieval systems such as Web search engines. Bag-of-words representation is computationally efficient and with proper term weighting and document ranking methods can perform surprisingly well for a simple document representation method. However, such a representation ignores the rich discourse structure in a document, which could provide useful clues when determining the relevancy of a document to a given user query. We develop the first-ever Discourse Search Engine (DSE) that exploits the discourse structure in documents to overcome the limitations associated with the bag-of-words document representations in information retrieval. We use Rhetorical Structure Theory (RST) to represent a document as a discourse tree connecting numerous elementary discourse units (EDUs) via discourse relations. Given a query, our discourse search engine can retrieve not only relevant documents to the query, but also individual statements from those relevant documents that describe some discourse relations to the query. We propose several ranking scores that consider the discourse structure in the documents to measure the relevance of a pair of EDUs to a query. Moreover, we combine those individual relevance scores using a random decision forest (RDF) model to create a single relevance score. Despite the numerous challenges of constructing a rich document representation using the discourse relations in a document, our experimental results show that it improves the F-score in an information retrieval task. We publicly release our manually annotated test collection to expedite future research in discourse-based information retrieval.
机译:将文档作为文档作为文档,并使用关键字来检索相关文档在大规模信息检索系统(如Web Search引擎)中看到了巨大的成功。单词袋式表示是计算上有效,并且具有适当的术语加权,并且文档排名方法对于简单的文档表示方法可以令人惊讶地表现出令人惊讶的。然而,这种表示忽略了文件中丰富的话语结构,当确定文档与给定用户查询的相关性时可以提供有用的线索。我们开发了首次采用文档中的话语结构的首次话语搜索引擎(DSE),以克服信息检索中与文字袋文档表示相关的限制。我们使用修辞结构理论(RST)代表作为通过话语关系连接众多基础话语单位(E​​DU)的话语树的文档。鉴于查询,我们的话语搜索引擎不仅可以检索对查询的相关文档,还可以检索来自这些相关文件的个人陈述,这些文件描述了对查询的一些话语关系的相关文件。我们提出了几个排名分数,以考虑文件中的话语结构,以衡量一对EDU对查询的相关性。此外,我们使用随机决策林(RDF)模型来结合那些个体相关性分数来创建单个相关评分。尽管使用文档中的话语关系构建了丰富的文档表示的挑战,但我们的实验结果表明它在信息检索任务中提高了F分。我们公开发布我们手动注释的测试集合,以加快基于话语的信息检索的未来研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号