Searching web data: An entity retrieval and high-performance indexing model

Renaud Delbru; Stephane Campinas; Giovanni Tummarello

首页> 外文期刊>Journal of web semantics: >Searching web data: An entity retrieval and high-performance indexing model

【24h】

Searching web data: An entity retrieval and high-performance indexing model

机译：搜索Web数据：实体检索和高性能索引模型

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

More and more (semi) structured information is becoming available on the web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with the ultimate goal of making it exploitable by humans and machines alike. This article examines the shift from the traditional web document model to a web data object (entity) model and studies the challenges faced in implementing a scalable and high performance system for searching semi-structured data objects over a large heterogeneous and decentralised infrastructure. Towards this goal, we define an entity retrieval model, develop novel methodologies for supporting this model and show how to achieve a high-performance entity retrieval system. We introduce an indexing methodology for semi-structured data which offers a good compromise between query expressiveness, query processing and index maintenance compared to other approaches. We address high-performance by optimisation of the index data structure using appropriate compression techniques. Finally, we demonstrate that the resulting system can index billions of data objects and provides keyword-based as well as more advanced search interfaces for retrieving relevant data objects in sub-second time. This work has been part of the Sindice search engine project at the Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice system currently maintains more than 200 million pages downloaded from the web and is being used activelv bv manv researchers within and outside of DERI.

机译：越来越多（半）结构化信息以嵌入元数据的文档（例如RDF，RDFa，Microformats等）的形式在网络上可用。已经有数以亿计的此类文档可供访问，并且它们的数量正在迅速增长。这就要求大型系统提供有效的手段来搜索和检索这种半结构化信息，其最终目标是使之能够被人和机器利用。本文研究了从传统的Web文档模型到Web数据对象（实体）模型的转变，并研究了在实现可扩展的高性能系统以在大型异构和分散式基础结构上搜索半结构化数据对象时面临的挑战。为了实现这一目标，我们定义了一个实体检索模型，开发了支持该模型的新颖方法，并展示了如何实现高性能的实体检索系统。我们介绍了一种针对半结构化数据的索引方法，与其他方法相比，该方法在查询表达性，查询处理和索引维护之间提供了很好的折衷方案。我们通过使用适当的压缩技术优化索引数据结构来解决高性能问题。最后，我们证明了所产生的系统可以索引数十亿个数据对象，并提供基于关键字的以及更高级的搜索界面，以便在不到一秒的时间内检索相关的数据对象。这项工作是NUI戈尔韦数字企业研究所（DERI）的Sindice搜索引擎项目的一部分。 Sindice系统目前维护着超过2亿个从Web下载的页面，并且正在DERI内部和外部被活跃的研究人员使用。

著录项

来源
《Journal of web semantics:》 |2012年第1期|p.33-58|共26页
作者
Renaud Delbru; Stephane Campinas; Giovanni Tummarello;
展开▼
作者单位

Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;

Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;

Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland,Fondazione Bruno Kessler, Trento, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
sindice; entity search and retrieval; compression; semantic web; semi-structured data; inverted index;

机译：罪过实体搜索和检索;压缩;语义网半结构化数据;倒排索引;

相似文献

外文文献
中文文献
专利

1. Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases [J] . Xiaoyan Li, Lidan Shou, Gang Chen, IEEE Transactions on Knowledge and Data Engineering . 2008,第11期

机译：为大型通用图像数据库中的有效索引和检索建模图像数据
2. Microarray retriever: a web-based tool for searching and large scale retrieval of public microarray data. [J] . Ivliev Alexander E., Hoen Peter A.C. t, Villerius Michel P., Nucleic Acids Research . 2008,第2期

机译：微阵列检索器：基于Web的工具，用于搜索和大规模检索公共微阵列数据。
3. Microarray retriever: a web-based tool for searching and large scale retrieval of public microarray data [J] . Alexander E. Ivliev, Bernd W. Brandt, Johan T. den Dunnen, Nucleic acids research . 2008,第suppla2期

机译：微阵列检索器：基于Web的工具，用于搜索和大规模检索公共微阵列数据
4. A Node Indexing Scheme for Web Entity Retrieval [C] . Renaud Delbru, Nickolai Toupikov, Michele Catasta, European semantic web conference;ESWC 2010 . 2010

机译：Web实体检索的节点索引方案
5. An integrated theory of image database modeling, indexing, and content-based retrieval. [D] . Rao, Aibing. 2001

机译：图像数据库建模，索引和基于内容的检索的集成理论。
6. Microarray retriever: a web-based tool for searching and large scale retrieval of public microarray data [O] . Alexander E. Ivliev, Peter A. C. t Hoen, Michel P. Villerius, 2008

机译：微阵列检索器：基于Web的工具用于搜索和大规模检索公共微阵列数据
7. Searching Web Data: an Entity Retrieval and High-Performance Indexing Model [O] . Renaud Delbrua, Stephane Campinasa, Giovanni Tummarelloa, 2015

机译：搜索Web数据：实体检索和高性能索引模型
8. Information Storage and Retrieval. Reports on Indexing Theory,Content Analysis,Feedback Searching and Dynamic Document Space. [R] . salton,gerard 1975

机译：信息存储和检索。关于索引理论，内容分析，反馈搜索和动态文档空间的报告。

Searching web data: An entity retrieval and high-performance indexing model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅