首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora
【24h】

Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora

机译:努力工作,努力播放:鳄梨和enron语料的电子邮件分类

获取原文

摘要

In this paper, we present an empirical study of email classification into two main categories "Business" and "Personal". We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For this classification task, we extract social networks features from the graphs in addition to lexical features from email content and we compare the performance of SVM and Extra-Trees classifiers using these features. Combining graph features with lexical features improves the performance on both classifiers. We also provide manually annotated sets of the Avocado and Enron email corpora as a supplementary contribution.
机译:在本文中,我们向两家主要类别“商业”和“个人”提供了对电子邮件分类的实证研究。我们在安康电子语料库上训练,并在安然和鳄梨电子邮件上进行测试。我们显示来自电子邮件交换网络的信息提高了分类的性能。我们将电子邮件交换网络代表为具有图形结构的社交网络。对于此分类任务,除了来自电子邮件内容的词汇功能之外,我们还从图中提取社交网络功能,并且我们使用这些功能进行比较SVM和extra-Treen分类器的性能。具有词汇功能的图形功能可提高两个分类器的性能。我们还提供手动注释的鳄梨和安龙电子语料集作为补充贡献。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号