Extracting Relevant Snippets fromWeb Documents through Language Model based Text Segmentation

机译：通过基于语言模型的文本分割从Web文档中提取相关片段

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

机译：提取面向查询的代码段（或段落）并在长文档中突出显示相关信息可以帮助降低最终用户的结果导航成本。虽然突出显示匹配关键字的传统方法在以关键字为导向的搜索时会有所帮助，但是找到合适的代码片段来表示与更复杂查询的匹配时，却需要新颖的技术，可以帮助简明地描述文档各个部分与给定查询的相关性。在本文中，我们提出了一种基于语言模型的方法，可以准确地检测给定文档中最相关的段落。与以前的段落检索工作不同，前者专注于搜索相关节点以过滤出所占用的段落，而我们专注于针对查询信息的片段摘录。本文中介绍的算法当前正在OASIS中部署，该系统可帮助减少盲人用户访问基于Web的数字图书馆时的导航负担。

著录项

来源
《IEEE/WIC/ACM International Conference on Web Intelligence》|2007年|P.287-290|共4页
会议地点
作者
Qing Li; K. Selcuk Candan; Yan Qi; PQing Li; PK. Selcuk Candan; PYan Qi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Query Recommendation based terms and relevant documents using language Models [J] . BTIHAL EL GHALI, ABDERRAHIM EL QADI, OMAR EL MIDAOUI, WSEAS Transactions on Information Science and Applications . 2015,第Null期

机译：使用语言模型查询基于建议书的术语和相关文档
2. SOUTH INDIAN TAMIL LANGUAGE HANDWRITTEN DOCUMENT TEXT LINE SEGMENTATION TECHNIQUE WITH AID OF SLIDING WINDOW AND SKEWING OPERATIONS [J] . SUNANDA DIXIT, Dr.H.N.SURESH Journal of Theoretical and Applied Information Technology . 2013,第2期

机译：南印度泰米尔语言手写文献文本线分割技术借助滑动窗口和偏斜操作
3. The Mechanism Analysis of Natural Language Texts in Order to Construct A Model of the Full-text Document [J] . A.S. Lebedev Science and Technology . 2013,第2A期

机译：自然语言文本的机理分析以构建全文本模型
4. Extracting Relevant Snippets fromWeb Documents through Language Model based Text Segmentation [C] . Neshati, M., Alijamaat, Web Intelligence (WI), 2007 IEEE/WIC/ACM International Conference on . 2007

机译：通过基于语言模型的文本分割从Web文档中提取相关片段
5. Markov random field model based text segmentation and image post processing of complex scanned documents [D] . Haneda, Eri 2011

机译：基于马尔可夫随机场模型的复杂扫描文档的文本分割和图像后处理
6. Free-text medical document retrieval via phrase-based vector space model. [O] . Wenlei Mao, Wesley W. Chu 2002

机译：通过基于短语的向量空间模型检索自由文本医学文献。
7. Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police [O] . Lara Quijano-Sánchez, Federico Liberatore, José Camacho-Collados, 2018

机译：将基于文本的欺骗性语言应用于警方报告：从多步分类模型中提取行为模式以了解我们如何欺骗警方

Extracting Relevant Snippets fromWeb Documents through Language Model based Text Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅