首页> 外文会议>International Conference on Science in Information Technology >An Analytical Study on Email Classification Using 10-Fold Cross-Validation
【24h】

An Analytical Study on Email Classification Using 10-Fold Cross-Validation

机译:使用十折交叉验证进行电子邮件分类的分析研究

获取原文

摘要

Start-up companies nowadays face main problems in managing a large amount of data, currently called “Big data” [1]. It is possible to say that most crucial data appear in email contents. As a result, email is considered as a valuable database. In order to obtain information from email messages, one possible preprocess is categorizing emails based on their contents. The research proposes a mechanism to classify email based on their contents. In order to evaluate the proposed mechanism, a 10-fold cross-validation method is applied. The empirical results demonstrate that the accuracy of email classification is approximately 61.89%, while the standard deviation is 2.77. The accuracy rate does not meet the research expectation perhaps because of the variance in word frequency in each category. Therefore, future work could weight each word to improve prediction performance.
机译:如今,初创公司在管理大量数据(目前称为“大数据”)[1]时面临主要问题。可以说,最关键的数据出现在电子邮件内容中。结果,电子邮件被认为是有价值的数据库。为了从电子邮件中获取信息,一种可能的预处理方法是根据电子邮件的内容对电子邮件进行分类。该研究提出了一种基于电子邮件内容进行分类的机制。为了评估所提出的机制,应用了10倍交叉验证方法。实证结果表明,电子邮件分类的准确性约为61.89%,而标准偏差为2.77。准确率可能不符合研究预期,可能是由于每个类别中词频的差异所致。因此,将来的工作可以对每个单词加权,以提高预测性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号