首页> 外文期刊>Real-time systems >String Matching Over Compressed Text on Handheld Devices Using Tagged Sub-Optimal Code (TSC)
【24h】

String Matching Over Compressed Text on Handheld Devices Using Tagged Sub-Optimal Code (TSC)

机译:使用标记的次优代码(TSC)在手持设备上对压缩文本进行字符串匹配

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents Tagged Sub-optimal code (TSC), a new coding technique to speed up string matching over compressed databases on personal digital assistants (PDA). TSC is a variable-length sub-optimal code that supports minimal prefix property. It always determines its codeword boundary without traversing a tree or lookup table. TSC technique may be beneficial in many types of applications: speeding up string matching over compressed text, and speeding decoding process. This paper also presents two algorithms for string matching over compressed text using TSC (SCTT) and the Byte Pair Encoding (BPE) technique (SCTB). Several experiments were conducted to compare the performance of TSC, Byte Pair Encoding (BPE), and Huffman code. Several PDA databases with different record sizes were used: the well-known Calgary dataset and a set of small-sized PDA databases. Experimental results show that SCTT is almost twice as fast as the Huffman-based algorithm. SCTT has also the same performance in search time as the search in uncompressed databases and is faster than the SCTB algorithm. For frequently updated PDA databases such as phone books, to-do list, and memos, SCTT is the recommended method regardless of the size of the average record length, since the time required to compress the updated records using BPE poses significant delays compared to TSC.
机译:本文介绍了标记次优代码(TSC),这是一种新的编码技术,可以加快个人数字助理(PDA)上压缩数据库上的字符串匹配。 TSC是可变长度次优代码,它支持最小前缀属性。它始终在不遍历树或查找表的情况下确定其代码字边界。 TSC技术在许多类型的应用中可能是有益的:加速压缩文本上的字符串匹配,以及加速解码过程。本文还介绍了两种使用TSC(SCTT)和字节对编码(BPE)技术(SCTB)在压缩文本上进行字符串匹配的算法。进行了一些实验,以比较TSC,字节对编码(BPE)和霍夫曼码的性能。使用了几个具有不同记录大小的PDA数据库:著名的卡尔加里数据集和一组小型PDA数据库。实验结果表明,SCTT的速度几乎是基于霍夫曼算法的两倍。 SCTT在搜索时间上的性能也与未压缩数据库中的搜索相同,并且比SCTB算法更快。对于频繁更新的PDA数据库(例如电话簿,待办事项列表和备忘录),无论平均记录长度有多大,推荐使用SCTT方法,因为与TSC相比,使用BPE压缩更新的记录所需的时间明显延迟。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号