首页> 外文OA文献 >Efficiently compressing string columnar data using frequent pattern mining

【2h】

Efficiently compressing string columnar data using frequent pattern mining

机译：使用频繁模式挖掘有效压缩字符串列数据

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In modern column-oriented databases, compression is important for improving I/O throughput and overall database performance. Many string columnar data cannot be compressed by special-purpose algorithms such as run-length encoding or dictionary compression, and the typical choice for them is the LZ77-based compression algorithms such as GZIP or Snappy. These algorithms treat data as a byte block and do not exploit the columnar nature of the data. In this thesis, we develop a compression algorithm using frequent string patterns directly mined from a sample of a string column. The patterns are used as the dictionary phrases for compression. We discuss some interesting properties of frequent patterns in the context of compression, and develop a pruning method to address the cache inefficiencies in indexing the patterns. Experiments show that our compression algorithm outperforms Snappy in compression ratio while retains compression and decompression speed.

机译：在现代的面向列的数据库中，压缩对于提高I / O吞吐量和整体数据库性能很重要。许多字符串列数据无法通过专用算法（例如行程编码或字典压缩）进行压缩，而它们的典型选择是基于LZ77的压缩算法（例如GZIP或Snappy）。这些算法将数据视为字节块，并且不利用数据的列性质。在本文中，我们使用直接从字符串列样本中提取的频繁字符串模式开发了一种压缩算法。模式用作压缩的字典短语。我们讨论了压缩模式下频繁模式的一些有趣特性，并开发了一种修剪方法来解决索引模式中的缓存效率低下的问题。实验表明，我们的压缩算法在压缩率方面优于Snappy，同时保留了压缩和解压缩速度。

著录项

作者
Wang Xiaojian;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. An Efficient Frequent Pattern Mining Algorithm to Find the Existence of K-Selective Interesting Patterns in Large Dataset Using SIFPMM [J] . Saravanan Suba, Christopher T. International Journal of Applied Engineering Research . 2016,第7aPta4期

机译：一种有效的频繁模式挖掘算法，使用SIFPMM查找大数据集中K个选择性感兴趣模式的存在
2. An efficient frequent pattern mining algorithm using a highly compressed prefix tree [J] . Zhu Xiaolin, Liu Yongguo Intelligent data analysis . 2019,第SUPPLa期

机译：使用高度压缩的前缀树的高效频繁模式挖掘算法
3. An efficient frequent pattern mining algorithm using a highly compressed prefix tree [J] . Zhu Xiaolin, Liu Yongguo Intelligent data analysis . 2019,第Suppla期

机译：使用高度压缩前缀树的高效频繁模式挖掘算法
4. An improved and efficient frequent pattern mining approach to discover frequent patterns among important attributes in large data set using IA-TJ-FGTT [C] . Saravanan Suba, T. Christopher 2016 IEEE International Conference on Advances in Computer Applications . 2016

机译：一种改进且高效的频繁模式挖掘方法，使用IA-TJ-FGTT在大数据集中的重要属性中发现频繁模式
5. Efficient frequent pattern mining over probabilistic databases. [D] . Tong, Yongxin. 2013

机译：通过概率数据库进行有效的频繁模式挖掘。
6. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases [O] . Md. Rezaul Karim, Md. Mamunur Rashid, Byeong-Soo Jeong, 2012

机译：从大型DNA序列数据库中挖掘最大连续频率模式的有效方法
7. PPFP(Push and Pop Frequent Pattern Mining): A Novel Frequent Pattern Mining Method for Bigdata Frequent Pattern Mining [O] . Jung-Hun Lee, Youn-A Min 2016

机译：PPFP（推动和流行频繁模式采矿）：一种新型频繁模式挖掘方法，用于频繁模式挖掘
8. Efficient bit string implementation of a database cross-field association system (with an application to protein sequence patterns) [R] . Guigo, R, Vazquez, I, Smith, T F 1992

机译：数据库跨域关联系统的高效位串实现（应用于蛋白质序列模式）

Efficiently compressing string columnar data using frequent pattern mining

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅