【24h】

Towards a semantic book search engine

机译:迈向语义书搜索引擎

获取原文

摘要

Traditional Information Retrieval (IR) methods were initially used for searching and ranking web pages on the Web. These methods were progressively modified to exploit the peculiarities of the Web including the use of the hyperlinked structure of the Web for relevance ranking. These Web IR techniques, however, are also being applied for searching and ranking to other forms of text collections which are not inherently web documents. Books (especially in PDF form) are by nature different from web pages because they lack an explicit hypertextual structure and therefore cannot be accurately and precisely searched and ranked using traditional approaches. Books contain a highly structured content with implicit logical connections among different parts of the same book as well as to related content in other books. These book structural semantics and logical connections could be discovered and used to establish a web of books where the logical concepts, images, figures, tables, and other parts are linked with each other thus resulting in a semantic graph, which could then be exploited by a semantic book search engine for more precise and accurate indexing, searching, ranking and recommendations. Based on this hypothesis, the paper outlines a high-level architecture for one of the possible implementations of a semantic book search engine, identifies all the potential areas of research for future researchers, and reports on our work in progress in the form of the proposed model for the purpose. The proposed architecture, if implemented in its true sense, has the potential to better serve the needs of all the stakeholders including authors, publishers, readers, and librarians.
机译:最初,传统的信息检索(IR)方法用于在Web上对网页进行搜索和排名。对这些方法进行了逐步修改,以利用Web的特殊性,包括使用Web的超链接结构进行相关性排名。但是,这些Web IR技术也被用于搜索和排序其他形式的文本集合,而这些文本集合本身并不是Web文档。书籍(尤其是PDF格式的书籍)与网页本质上是不同的,因为它们缺乏明确的超文本结构,因此无法使用传统方法进行准确,精确的搜索和排名。书籍包含高度结构化的内容,在同一本书的不同部分之间以及与其他书籍中的相关内容之间具有隐式逻辑联系。这些书本的结构语义和逻辑联系可以被发现并用于建立一个书本网络,其中逻辑概念,图像,图形,表格和其他部分相互链接,从而产生一个语义图,然后可以被该图利用语义书搜索引擎,可进行更精确,更准确的索引,搜索,排名和推荐。基于此假设,本文概述了语义书搜索引擎的一种可能实现方式的高级体系结构,为未来的研究人员确定了所有潜在的研究领域,并以提议的形式报告了我们正在进行的工作目的模型。所提议的体系结构,如果以其真正的意义实施,则有可能更好地满足所有利益相关者的需求,包括作者,出版者,读者和图书馆员。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号