首页> 外文期刊>Language Resources and Evaluation >Creating a live, public short message service corpus: the NUS SMS corpus
【24h】

Creating a live, public short message service corpus: the NUS SMS corpus

机译:创建一个实时的公共短消息服务语料库:NUS SMS语料库

获取原文
获取原文并翻译 | 示例
           

摘要

Short Message Service (SMS) messages are short messages sent from one person to another from their mobile phones. They represent a means of personal communication that is an important communicative artifact in our current digital era. As most existing studies have used private access to SMS corpora, comparative studies using the same raw SMS data have not been possible up to now. We describe our efforts to collect a public SMS corpus to address this problem. We use a battery of methodologies to collect the corpus, paying particular attention to privacy issues to address contributors' concerns. Our live project collects new SMS message submissions, checks their quality, and adds valid messages. We release the resultant corpus as XML and as SQL dumps, along with monthly corpus statistics. We opportunistically collect as much metadata about the messages and their senders as possible, so as to enable different types of analyses. To date, we have collected more than 71,000 messages, focusing on English and Mandarin Chinese.
机译:短消息服务(SMS)消息是一个人从他们的手机发送到另一个人的短消息。它们代表了一种个人交流手段,是当今数字时代重要的交流工具。由于大多数现有研究都使用私人访问SMS语料库,因此到目前为止,尚不可能使用相同的原始SMS数据进行比较研究。我们描述了收集公共SMS语料库以解决此问题的努力。我们使用一系列方法来收集语料,特别注意隐私问题,以解决贡献者的疑虑。我们的实时项目收集新的SMS消息提交,检查其质量并添加有效消息。我们将生成的语料库以XML和SQL转储的形式发布,以及每月语料库统计信息。我们机会性地收集尽可能多的有关消息及其发送者的元数据,以便进行不同类型的分析。迄今为止,我们已经收集了超过71,000条消息,重点是英语和普通话。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号