首页> 外文会议>IEEE/WIC/ACM International Conference on Web Intelligence >Extracting Relevant Snippets fromWeb Documents through Language Model based Text Segmentation
【24h】

Extracting Relevant Snippets fromWeb Documents through Language Model based Text Segmentation

机译:通过基于语言模型的文本分割从Web文档中提取相关片段

获取原文

摘要

Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.
机译:提取面向查询的代码段(或段落)并在长文档中突出显示相关信息可以帮助降低最终用户的结果导航成本。虽然突出显示匹配关键字的传统方法在以关键字为导向的搜索时会有所帮助,但是找到合适的代码片段来表示与更复杂查询的匹配时,却需要新颖的技术,可以帮助简明地描述文档各个部分与给定查询的相关性。在本文中,我们提出了一种基于语言模型的方法,可以准确地检测给定文档中最相关的段落。与以前的段落检索工作不同,前者专注于搜索相关节点以过滤出所占用的段落,而我们专注于针对查询信息的片段摘录。本文中介绍的算法当前正在OASIS中部署,该系统可帮助减少盲人用户访问基于Web的数字图书馆时的导航负担。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号