首页> 外文会议>IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things >A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding
【24h】

A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding

机译:基于词典的压缩方案,用于减少比特编码的自然语言文本

获取原文

摘要

Data compression, also called compaction, the process of reducing the amount of data needed for the storage or transmission of a given piece of information, typically by the use of encoding techniques. Character encoding is genuinely related to data compression which represents characters with a type of encoding technique. Encoding characterizes the way toward putting a movement of characters into a specific arrangement for incredible transmission or point of confinement. Compression of data covers a goliath space of employments including data correspondence, data securing and database improvement. For the most part two surely understood compression procedures named Huffman and LZW are really utilized for text compression. In this paper, we propose an effective and straightforward compression techniques for huge common text by a 5 bit encoding scheme which can convert 8 bit characters to 5 bit named 5 Bit Encoding Scheme (5BE). It can most likely beat Huffman and LZW regarding compression proportion. This plan gives an encoding calculation changing over any 8 bit characters in English and Bangla by 5 bit by using a look up table. The look up table is created by utilizing Zipf dissemination which is a discrete circulation of generally utilized characters in various dialects. In the wake of changing over the characters into 5 bit, we consistently ascertain a k-Series scheme to build a database dictionary. With the penalty of storage for the dictionary, we compress a natural text by 87%. This dictionary will be used by the compression and decompression algorithms and to be employed in the client side. Therefore, constructed only once. Hence the facilities provided by the compression technique will be found without interruption. The reverse algorithm to recuperate the genuine data is additionally illustrated. We compare our algorithm to both the known Huffman and LZW technique. Promising efficiency is exhibited by our experimental result.
机译:数据压缩,也称为压缩,通常通过使用编码技术来减少存储或传输给定信息所需的数据量的过程。字符编码与数据压缩真正有关,代表具有一种编码技术的字符。编码表征了将字符移动到特定布置中的方式,以实现令人难以置信的传输或限制点。压缩数据涵盖了具有数据对应,数据确保和数据库改进的歌剧的歌剧空间。在大多数情况下,两个肯定地理解名为Huffman和LZW的压缩程序真的用于文本压缩。在本文中,我们提出了一种由5位编码方案的巨大常用文本提出了一种有效和直接的压缩技术,该方案可以将8位字符转换为5位编码方案(5Be)。它很可能会击败霍夫曼和LZW关于压缩比例。此计划通过使用查找表给出了编码计算,在英语和Bangla中以任何8位字符更改为5位。通过利用ZIPF传播来创建查找表,该ZIPF传播是各种方言中通常使用字符的离散循环。在将字符变化为5位之后,我们始终如一地确定一个K系列方案来构建数据库字典。随着字典的存储罚款,我们将自然文本压缩87%。该字典将由压缩和解压缩算法使用,并在客户端中使用。因此,只构建一次。因此,在没有中断的情况下将找到压缩技术提供的设施。另外说明了恢复真实数据的反向算法。我们将算法与已知的Huffman和LZW技术进行比较。我们的实验结果表现出了有希望的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号