Compressed Web Indexes

机译：压缩网页索引

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web search engines use indexes to efficiently retrieve pages containing specified query terms, as well as pages linking to specified pages. The problem of compressed indexes that permit such fast retrieval has a long history. We consider the problem: assuming that the terms in (or links to) a page are generated from a probability distribution, how well compactly can we build such indexes that allow fast retrieval? Of particular interest is the case when the probability distribution is Zipfian (or a similar power law), since these are the distributions that arise on the web.We obtain sharp bounds on the space requirement of Boolean indexes for text documents that follow Zipf's law. In the process we develop a general technique that applies to any probability distribution, not necessarily a power law; this is the first analysis of compression in indexes under arbitrary distributions. Our bounds lead to quantitative versions of rules of thumb that are folklore in indexing. Our experiments on several document collections show that the distribution of terms appears to follow a double-Pareto law rather than Zipf's law. Despite widely varying sets of documents, the index sizes observed in the experiments conform well to our theoretical predictions.

机译：Web搜索引擎使用索引来有效地检索包含指定查询词的页面以及链接到指定页面的页面。允许快速检索的压缩索引问题历史悠久。我们考虑这个问题：假设页面中的术语（或指向页面的链接）是从概率分布中生成的，那么我们可以在多大程度上构建出允许快速检索的索引呢？当概率分布为Zipfian（或类似的幂定律）时，会特别引起关注，因为这些是网络上出现的分布。对于遵循齐普夫定律的文本文档，我们对布尔索引的空间要求有了明确的界限。在此过程中，我们开发了一种适用于任何概率分布的通用技术，不一定适用于幂定律。这是对任意分布下的索引压缩的首次分析。我们的界限导致了索引的民俗学的定量版本的经验法则。我们对多个文档集的实验表明，术语的分布似乎遵循双重帕累托定律而不是齐普夫定律。尽管有各式各样的文档集，但实验中观察到的索引大小仍符合我们的理论预测。

著录项

来源
《International world wide web conference;WWW 09 》|2009年|p.441-450|共10页
会议地点
作者
Flavio Chierichetti; Ravi Kumar; Prabhakar Raghavan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络 ;
关键词
power law; double-Pareto; index size; compression;

机译：幂律;双帕累托指数大小;压缩;

相似文献

外文文献
中文文献
专利

1. Grey preference analysis of indoor environmental factors using sub-indexes based on Weber/Fechner's law and predicted mean vote [J] . Li Nianping, Cui Haijiao, Zhu Chihui, Indoor and built environment . 2016 ,第8期

机译：基于Weber / Fechner定律和预测均值投票的子指数对室内环境因素的灰色偏好分析
2. A topo-dynamical perspective to evaluate indirect interactions in trophic webs: New indexes [J] . Torres-Alruiz M.D., Rodríguez D.J. Ecological Modelling . 2013 ,第Null期

机译：拓扑动力学观点，评估营养网中的间接相互作用：新指标
3. Predicting the unconfined compressive strength of granite using only two non-destructive test indexes [J] . Armaghani Daniel J., Mamou Anna, Maraveas Chrysanthos, Geomechanics and engineering . 2021 ,第4期

机译：仅使用两个非破坏性测试指标预测花岗岩的无束缚的抗压强度
4. Compressed web indexes [C] . Flavio Chierichetti, Ravi Kumar, Prabhakar Raghavan International conference on World wide web . 2009

机译：压缩网页索引
5. Seismic Deblending: Using Iterative and Compressive Sensing Methods to Quantify Blending Noise Impact on 4D Projects =Seismic deblending: Usando os métodos iterativo e compressive sensing para quantificar o impacto do blending noise em projetos 4D [D] . Velasques, Max M. 2020

机译：地震脱模：使用迭代和压缩检测方法对4D项目量化混合噪声影响=地震弯曲：使用迭代和压缩感测方法量化对4D项目的混合噪声影响
6. checkMyIndex: a web-based R/Shiny interface for choosing compatible sequencing indexes [O] . Hugo Varet, Jean-Yves Coppée -1

机译：checkMyIndex：基于Web的R / Shiny界面用于选择兼容的排序索引
7. Compressing Term Positions in Web Indexes [O] . Hao Yan, Shuai Ding, Torsten Suel 2010

机译：压缩Web索引中的术语位置
8. EFFECT OF VARIATION IN DIAMETER AND PITCH OF RIVETS ON COMPRESSIVE STRENGTH OF PANELS WITH Z-SECTION STIFFENERS PANELS THAT FAIL BY LOCAL BUCKLING AND HAVE VARIOUS VALUES OF WIDTH-TO-THICKNESS RATIO FOR THE WEBS OF THE STIFFENERS [R] . Norris F. Dow, William A. Hickman 1948

机译：Z形截面加劲肋板对桁架直径和间距变化的影响对局部屈曲失效并且具有宽厚比的各种厚度比的加强筋的影响

Compressed Web Indexes

摘要

著录项

相似文献

相关主题

期刊订阅