首页> 外文会议>International Conference on Advances in Natural Language Processing(EsTAL 2004); 20041020-22; Alicante(ES) >On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification
【24h】

On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification

机译:朴素贝叶斯文本分类中的词频信息和否定证据

获取原文
获取原文并翻译 | 示例

摘要

The Naive Bayes classifier exists in different versions. One version, called multi-variate Bernoulli or binary independence model, uses binary word occurrence vectors, while the multinomial model uses word frequency counts. Many publications cite this difference as the main reason for the superior performance of the multinomial Naive Bayes classifier. We argue that this is not true. We show that when all word frequency information is eliminated from the document vectors, the multinomial Naive Bayes model performs even better. Moreover, we argue that the main reason for the difference in performance is the way that negative evidence, i.e. evidence from words that do not occur in a document, is incorporated in the model. Therefore, this paper aims at a better understanding and a clarification of the difference between the two probabilistic models of Naive Bayes.
机译:朴素贝叶斯分类器存在不同版本。一种版本称为多元伯努利(Bernoulli)或二进制独立模型,它使用二进制单词出现向量,而多项式模型则使用单词频率计数。许多出版物都将这种差异视为多项式朴素贝叶斯分类器性能优异的主要原因。我们认为这是不正确的。我们表明,当从文档向量中消除所有词频信息时,多项式朴素贝叶斯模型的性能甚至更好。此外,我们认为,性能差异的主要原因是将否定证据(即来自文档中未出现的单词的证据)纳入模型的方式。因此,本文旨在更好地理解和阐明朴素贝叶斯两种概率模型之间的差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号