A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding

机译：基于词典的压缩方案，用于减少比特编码的自然语言文本

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data compression, also called compaction, the process of reducing the amount of data needed for the storage or transmission of a given piece of information, typically by the use of encoding techniques. Character encoding is genuinely related to data compression which represents characters with a type of encoding technique. Encoding characterizes the way toward putting a movement of characters into a specific arrangement for incredible transmission or point of confinement. Compression of data covers a goliath space of employments including data correspondence, data securing and database improvement. For the most part two surely understood compression procedures named Huffman and LZW are really utilized for text compression. In this paper, we propose an effective and straightforward compression techniques for huge common text by a 5 bit encoding scheme which can convert 8 bit characters to 5 bit named 5 Bit Encoding Scheme (5BE). It can most likely beat Huffman and LZW regarding compression proportion. This plan gives an encoding calculation changing over any 8 bit characters in English and Bangla by 5 bit by using a look up table. The look up table is created by utilizing Zipf dissemination which is a discrete circulation of generally utilized characters in various dialects. In the wake of changing over the characters into 5 bit, we consistently ascertain a k-Series scheme to build a database dictionary. With the penalty of storage for the dictionary, we compress a natural text by 87%. This dictionary will be used by the compression and decompression algorithms and to be employed in the client side. Therefore, constructed only once. Hence the facilities provided by the compression technique will be found without interruption. The reverse algorithm to recuperate the genuine data is additionally illustrated. We compare our algorithm to both the known Huffman and LZW technique. Promising efficiency is exhibited by our experimental result.

机译：数据压缩，也称为压缩，通常通过使用编码技术来减少存储或传输给定信息所需的数据量的过程。字符编码与数据压缩真正有关，代表具有一种编码技术的字符。编码表征了将字符移动到特定布置中的方式，以实现令人难以置信的传输或限制点。压缩数据涵盖了具有数据对应，数据确保和数据库改进的歌剧的歌剧空间。在大多数情况下，两个肯定地理解名为Huffman和LZW的压缩程序真的用于文本压缩。在本文中，我们提出了一种由5位编码方案的巨大常用文本提出了一种有效和直接的压缩技术，该方案可以将8位字符转换为5位编码方案（5Be）。它很可能会击败霍夫曼和LZW关于压缩比例。此计划通过使用查找表给出了编码计算，在英语和Bangla中以任何8位字符更改为5位。通过利用ZIPF传播来创建查找表，该ZIPF传播是各种方言中通常使用字符的离散循环。在将字符变化为5位之后，我们始终如一地确定一个K系列方案来构建数据库字典。随着字典的存储罚款，我们将自然文本压缩87％。该字典将由压缩和解压缩算法使用，并在客户端中使用。因此，只构建一次。因此，在没有中断的情况下将找到压缩技术提供的设施。另外说明了恢复真实数据的反向算法。我们将算法与已知的Huffman和LZW技术进行比较。我们的实验结果表现出了有希望的效率。

著录项

来源
《IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things》|2019年|1 v.|共6页
会议地点
作者
Md. Ashiq Mahmood; K.M. Azharul Hasan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
encoding; compression; decompression; 5-bit compression; compression ratio;

机译：编码;压缩;减压;5位压缩;压缩比;

相似文献

外文文献
中文文献
专利

1. A fast dynamic compression scheme for natural language texts [J] . Ashutosh Gupta, Suneeta Agarwal Computers & mathematics with applications . 2010,第12期

机译：自然语言文本的快速动态压缩方案
2. A BIT-LEVEL TEXT COMPRESSION SCHEME BASED ON THE HCDC ALGORITHM [J] . H. Al-Bahadili, A. Rababaa International Journal of Computers & Applications . 2010,第3期

机译：基于HCDC算法的位级文本压缩方案
3. A Quantum Secret Sharing Scheme Using Orbital Angular Momentum onto Multiple Spin States Based on Fibonacci Compression Encoding [J] . Lai Hong, Luo Ming-Xing, Xu Yong-Jian, Communications in Theoretical Physics . 2018,第4期

机译：基于Fibonacci压缩编码的轨道角动量在多个旋转状态下的量子秘密共享方案
4. A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding [C] . Md. Ashiq Mahmood, K.M. Azharul Hasan IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things . 2019

机译：基于字典的自然语言文本的压缩方案
5. Creation of encoding schemes to reduce markup language-based overhead. [D] . Larson, Theodore L. 2007

机译：创建编码方案以减少基于标记语言的开销。
6. Natural Language Processing and Automatic SNOMED-Encoding of Free Text: An Analysis of Free Text Data from a Routine Electronic Patient Record Application with a Parsing Tool Using the German SNOMED II [O] . Joerg H. Hohnloser, Matthias Holzer, Martin R.G. Fischer, 1996

机译：自然语言处理和自由文本的自动SNOMED编码：使用德语SNOMED II的解析工具对例行电子病历应用中的自由文本数据进行分析
7. SMS Text Compression through IDBE (Intelligent Dictionary based Encoding) for Effective Mobile Storage Utilization [O] . Parul Bhanarkar, Nikhil Jha 2014

机译：通过IDBE（基于智能词典的编码）进行短信文本压缩，以实现有效移动存储利用
8. Supporting the Acquisition of New Concepts from Natural Language Texts with aMeaning Dictionary [R] . Haenelt, K. 1991

机译：用意义词典支持从自然语言文本中获取新概念

A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding

摘要

著录项

相似文献

相关主题

期刊订阅