首页> 外文会议>IEEE International Conference on Signal Processing >Text Categorization of Enron Email Corpus Based on Information Bottleneck and Maximal Entropy
【24h】

Text Categorization of Enron Email Corpus Based on Information Bottleneck and Maximal Entropy

机译:基于信息瓶颈和最大熵的安然电子语料库文本分类

获取原文

摘要

This paper is for text categorization of Enron email corpus, we use the information bottleneck (IB) method to cluster the key words based on their distribution on different class labels, then we use threads and address groups as additional features to email texts, and the maximal entropy model to improve the accuracy of the classifier. Our experimental results shows that these measures can improve the classifier's performances, for keywords change too rapidly in emails while address groups are much steadier.
机译:本文是用于ENRON电子语料库的文本分类,我们使用信息瓶颈(IB)方法基于它们在不同类标签上的分发来培养关键词,然后我们将线程和地址组用作电子邮件文本的其他功能,以及最大熵模型,提高分类器的准确性。我们的实验结果表明,这些措施可以改善分类器的表现,因为在地址群体中,在电子邮件中的关键字在电子邮件中变化太快也是如此。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号