Computing preset dictionaries from text corpora for the compression of messages

机译：从文本语料库计算预设字典来压缩消息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Rigid length limits of short messages greatly restrict users' ability to express ideas intelligibly. While data compression can help by enabling greater expressivity in short messages, most work in compression has focused on managing large streams of data instead of small ones. We investigated the potential for preset dictionaries to unleash zlib's ability to compress short messages typical of Short Message Service (SMS) texts, microblog updates, and other single-packet transactions. This paper proposes two preset dictionary generation methods and reports strong test results across two dissimilar text corpora: the Enron database of email messages, and the IEEE VAST Challenge 2011 microblog corpus. For exchanges in English using our proposed methods, it is possible to extend "tweets" from 140 to 197 septets on average, and to extend SMS texts from 160 to 227 septets on average. The preset dictionary's role is as important as zlib's, and each requires the other to obtain these gains.

机译：短消息的严格长度限制极大地限制了用户清晰表达想法的能力。虽然数据压缩可以通过提高短消息的表现力来提供帮助，但大多数压缩工作都集中在管理大型数据流而不是小型数据流上。我们研究了预设字典释放zlib压缩短消息服务（SMS）文本，微博更新和其他单数据包事务中典型的短消息功能的潜力。本文提出了两种预设的词典生成方法，并在两个不同的文本语料库中报告了强大的测试结果：电子邮件的安然数据库和IEEE VAST Challenge 2011微博客语料库。对于使用我们提出的方法进行的英语交流，可以将“ tweets”平均从140个扩展到197个，并且将SMS文本平均从160个扩展到227个。预设词典的角色与zlib一样重要，并且每个词典都需要对方才能获得这些收益。

著录项

来源
《International Conference on Data and Software Engineering》|2014年|1-5|共5页
会议地点
作者
Abel Marc W.; Chung Soon M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Message compression; SMS messages; performance analysis; preset dictionary; zlib;

机译：消息压缩; SMS消息;性能分析;预设字典; zlib;

相似文献

外文文献
中文文献
专利

1. Dictionary Based Text Filter for Lossless Text Compression [J] . Rexline S. J, Robert L, Trujilla Lobo.F International Journal of Computer Trends and Technology . 2017,第3期

机译：基于字典的文本过滤器，用于无损文本压缩
2. Avoid violence, rioting, and outrage; approach celebration, delight, and strength: Using large text corpora to compute valence, arousal, and the basic emotions [J] . Westbury Chris, Keith Jeff, Briesemeister Benny B., The quarterly journal of experimental psychology: QJEP . 2015,第8期

机译：避免暴力，暴动和暴行;庆祝，愉悦和力量的方法：使用大型文本语料库来计算价，唤醒和基本情绪
3. A dictionary-based text compression technique using quaternary code [J] . Ahsan Habib, M. Jahirul Islam, Mohammad Shahidur Rahman Iran Journal of Computer Science . 2020,第3期

机译：基于词典的文本压缩技术使用第四纪代码
4. Computing preset dictionaries from text corpora for the compression of messages [C] . Abel Marc W., Chung Soon M. International Conference on Data and Software Engineering . 2014

机译：从文本语料库计算预设词典，以压缩消息
5. I Text Therefore I Am: Message Interactivity vs. Message Exchange in Addictive Use of Instant Messaging [D] . Wu, Mu. 2016

机译：因此，我是：我是：消息交互性与消息交换在上瘾使用即时消息中
6. Correction: Design and rationale of the Cardiovascular Health and Text Messaging (CHAT) Study and the CHAT-Diabetes Mellitus (CHAT-DM) Study: two randomised controlled trials of text messaging to improve secondary prevention for coronary heart disease and diabetes [O] . 2018

机译：更正：心血管健康和短信（CHAT）研究和CHAT-糖尿病（CHAT-DM）研究的设计和原理：两项短信以改善冠心病和糖尿病的二级预防的随机对照试验
7. Acquiring syntactic information for a government pattern dictionary from large text corpora [O] . Sofía N. Galicia-haro, Alexander Gelbukh, Igor A. Bolshakov 2010

机译：从大文本语料库中获取政府模式词典的句法信息

Computing preset dictionaries from text corpora for the compression of messages

摘要

著录项

相似文献

相关主题

期刊订阅