首页> 外文会议>International Conference on Data and Software Engineering >Computing preset dictionaries from text corpora for the compression of messages
【24h】

Computing preset dictionaries from text corpora for the compression of messages

机译:从文本语料库计算预设字典来压缩消息

获取原文

摘要

Rigid length limits of short messages greatly restrict users' ability to express ideas intelligibly. While data compression can help by enabling greater expressivity in short messages, most work in compression has focused on managing large streams of data instead of small ones. We investigated the potential for preset dictionaries to unleash zlib's ability to compress short messages typical of Short Message Service (SMS) texts, microblog updates, and other single-packet transactions. This paper proposes two preset dictionary generation methods and reports strong test results across two dissimilar text corpora: the Enron database of email messages, and the IEEE VAST Challenge 2011 microblog corpus. For exchanges in English using our proposed methods, it is possible to extend "tweets" from 140 to 197 septets on average, and to extend SMS texts from 160 to 227 septets on average. The preset dictionary's role is as important as zlib's, and each requires the other to obtain these gains.
机译:短消息的严格长度限制极大地限制了用户清晰表达想法的能力。虽然数据压缩可以通过提高短消息的表现力来提供帮助,但大多数压缩工作都集中在管理大型数据流而不是小型数据流上。我们研究了预设字典释放zlib压缩短消息服务(SMS)文本,微博更新和其他单数据包事务中典型的短消息功能的潜力。本文提出了两种预设的词典生成方法,并在两个不同的文本语料库中报告了强大的测试结果:电子邮件的安然数据库和IEEE VAST Challenge 2011微博客语料库。对于使用我们提出的方法进行的英语交流,可以将“ tweets”平均从140个扩展到197个,并且将SMS文本平均从160个扩展到227个。预设词典的角色与zlib一样重要,并且每个词典都需要对方才能获得这些收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号