首页> 外文会议>Graph-based methods for natural language processing workshop 2017 >Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora
【24h】

Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora

机译:努力工作,尽情玩乐:Avocado和Enron Corpora上的电子邮件分类

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present an empirical study of email classification into two main categories "Business" and "Personal". We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For this classification task, we extract social networks features from the graphs in addition to lexical features from email content and we compare the performance of SVM and Extra-Trees classifiers using these features. Combining graph features with lexical features improves the performance on both classifiers. We also provide manually annotated sets of the Avocado and Enron email corpora as a supplementary contribution.
机译:在本文中,我们对电子邮件分类分为两个主要类别(“业务”和“个人”)进行了实证研究。我们在Enron电子邮件语料库上进行培训,并在Enron和Avocado电子邮件语料库上进行测试。我们表明,来自电子邮件交换网络的信息可提高分类性能。我们将电子邮件交换网络表示为具有图结构的社交网络。对于此分类任务,除了从电子邮件内容中提取词汇特征外,我们还从图中提取社交网络特征,并使用这些特征比较SVM和Extra-Trees分类器的性能。将图特征与词汇特征结合使用可提高两个分类器的性能。我们还提供牛油果和安然电子邮件语料库的手动注释集,作为补充。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号