High-order entropy-compressed text indexes

机译：高阶熵压缩文本索引

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet σ, where each symbol is encoded by lg|σ| bits. We show that compressed suffix arrays use just nHh + σ bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(m lg |σ| + polylog(n)) time. The term Hh ≤ lg |σ| denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical entropy of the text (with a multiplicative constant 1). If the text is highly compressible so that Hn = o(1) and the alphabet size is small, we obtain a text index with o(m) search time that requires only o(n) bits. Further results andtradeoffs are reported in the paper.

机译：我们提出了一种压缩后缀数组的新颖实现，它显示了字母σ上 n 个给定文本（或序列）的给定文本（或序列）的搜索时间与空间占用之间的新折衷，其中每个符号由lg＆verbar;σ＆verbar编码;位。我们显示压缩后缀数组仅使用nH h +σ位，同时保留全文本索引功能，例如在 O < / I>（ m lg＆verbar;σ＆verbar; + polylog（ n ））时间。项H h ≤lg＆verbar;σ＆verbar;表示文本的h阶经验熵，这意味着我们的索引在空间上除低阶术语外几乎都是最佳的，从而渐近地实现了文本的经验熵（乘数为1）。如果文本具有高度可压缩性，使得H n = o（1）并且字母大小较小，则我们将获得搜索时间为o（m）且仅需要o（n）位的文本索引。本文报道了进一步的结果和权衡。 展开▼

著录项

来源
《Annual ACM-SIAM symposium on Discrete algorithms;ACM-SIAM symposium on Discrete algorithms》|2003年|P.841-850|共10页

会议地点

作者
Roberto Grossi; Ankur Gupta; Jeffrey Scott Vitter; PRoberto Grossi; PAnkur Gupta; PJeffrey Scott Vitter;
展开▼

作者单位

展开▼

会议组织

原文格式 PDF

正文语种

中图分类计算技术、计算机技术;

关键词

相似文献

外文文献

中文文献

专利

1. HIGH-ORDER MULTIVARIATE MARKOV CHAIN APPLIED IN DOW JONES AND IBOVESPA INDEXES [J] . Rafaela Boeira?Cechin, Leandro Luís?Corso Pesquisa Operacional . 2019,第1期

机译：高阶多元马尔可夫链应用于道琼斯指数和IBOVESPA指数

2. Evaluating the effects of analogy enriched text on the learning of science: The importance of learning indexes [J] . StellaVosniadou, IriniSkopeliti Journal of research in science teaching . 2019,第6期

机译：评估类比丰富的文本对科学学习的影响：学习指标的重要性

3. Texture Classification Based On Variants of Fundamental Units of LBP Using Complete Text on Indexes [J] . Y.SowjanyaKumari, V.VijayaKumar, Ch.Satyanarayana IOSR journal of computer engineering . 2018,第5期

机译：基于LBP基本单位的纹理分类，使用索引上的完整文本

4. High-order entropy-compressed text indexes [C] . Roberto Grossi, Ankur Gupta, Jeffrey Scott Vitter, Annual ACM-SIAM symposium on Discrete algorithms . 2003

机译：高阶熵压缩文本索引

5. Investigation of high-order and optimized interpolation methods with implementation in a high-order overset grid fluid dynamics solver. [D] . Sherer, Scott Eric. 2002

机译：研究高阶优化插值方法及其在高阶过冲网格流体动力学求解器中的实现。

6. New Formulae for the High-Order Derivatives of Some Jacobi Polynomials: An Application to Some High-Order Boundary Value Problems [O] . W. M. Abd-Elhameed -1

机译：某些Jacobi多项式的高阶导数的新公式：在某些高阶边值问题中的应用

7. Dynamic entropy-compressed sequences and full-text indexes [O] . Veli Mäkinen, Gonzalo Navarro 2009

机译：动态熵压缩序列和全文索引

1. 高阶熵压缩的全文自索引 [J] . 霍红卫 ,陈晓阳 ,陈龙刚 . 计算机学报 . 2016,第012期

2. 高斯纯态的高阶压缩与信息熵 [J] . 夏云杰 ,赵明山 . 光电子．激光 . 1996,第004期

3. 网络搜索引擎压缩文本搜索技术 [J] . 赵金海 . 现代情报 . 2007,第010期

4. 一种基于压缩的全文本数据库倒排索引方法 [J] . 赵鹏 . 黑龙江大学自然科学学报 . 2005,第003期

5. 基于邻接矩阵全文索引模型的文本压缩技术 [J] . 陶晓鹏 ,胡运发 . 中文信息学报 . 2004,第001期

6. 熵测不准关系与光场的熵压缩 [C] . 方卯发 . 量子光学学报文摘 . -1

7. 序列和文本的熵压缩结构研究 [A] . 洪陈建 . 2018

1. 基于信源高阶熵的数据压缩方法 [P] . 中国专利： CN1209925C . 2005.07.06

2. 基于信源高阶熵的数据压缩方法 [P] . 中国专利： CN1447603A . 2003-10-08

3. Techniques of efficient query over text, image, audio, video and other domain specific data in XML using XML table index with integration of text index and other domain specific indexes [P] . 外国专利： US8478760B2 . 2013-07-02

机译：使用XML表索引以及文本索引和其他域特定索引的集成，可以有效地查询XML中的文本，图像，音频，视频和其他域特定数据的技术

4. Techniques of efficient query over text, image, audio, video and other domain specific data in XML using XML table index with integration of text index and other domain specific indexes [P] . 外国专利： US2008120322A1 . 2008-05-22

机译：使用XML表索引以及文本索引和其他领域特定索引的集成，可以有效地查询XML中的文本，图像，音频，视频和其他领域特定数据的技术

5. Text-to-media indexes on online social networks [P] . 外国专利： US11074309B2 . 2021-07-27

机译：在线社交网络上的文本到媒体索引

相关主题

High-order entropy-compressed text indexes

摘要

著录项

相似文献

相关主题

期刊订阅