On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification

机译：朴素贝叶斯文本分类中的词频信息和否定证据

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Naive Bayes classifier exists in different versions. One version, called multi-variate Bernoulli or binary independence model, uses binary word occurrence vectors, while the multinomial model uses word frequency counts. Many publications cite this difference as the main reason for the superior performance of the multinomial Naive Bayes classifier. We argue that this is not true. We show that when all word frequency information is eliminated from the document vectors, the multinomial Naive Bayes model performs even better. Moreover, we argue that the main reason for the difference in performance is the way that negative evidence, i.e. evidence from words that do not occur in a document, is incorporated in the model. Therefore, this paper aims at a better understanding and a clarification of the difference between the two probabilistic models of Naive Bayes.

机译：朴素贝叶斯分类器存在不同版本。一种版本称为多元伯努利（Bernoulli）或二进制独立模型，它使用二进制单词出现向量，而多项式模型则使用单词频率计数。许多出版物都将这种差异视为多项式朴素贝叶斯分类器性能优异的主要原因。我们认为这是不正确的。我们表明，当从文档向量中消除所有词频信息时，多项式朴素贝叶斯模型的性能甚至更好。此外，我们认为，性能差异的主要原因是将否定证据（即来自文档中未出现的单词的证据）纳入模型的方式。因此，本文旨在更好地理解和阐明朴素贝叶斯两种概率模型之间的差异。

著录项

来源
《International Conference on Advances in Natural Language Processing(EsTAL 2004); 20041020-22; Alicante(ES)》|2004年|P.474-485|共12页
会议地点 Alicante(ES)
作者
Karl-Michael Schneider;
展开▼
作者单位

Department of General Linguistics University of Passau, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Integrating associative rule-based classification with Naive Bayes for text classification [J] . Hadi Wael, Al-Radaideh Qasem A., Alhawari Samer Applied Soft Computing . 2018,第期

机译：将基于关联规则的分类与Naive Bayes集成进行文本分类
2. An optimization text summarization method based on naive Bayes and topic word for single syllable language [J] . Ha Nguyen Thi Thu Applied mathematical sciences . 2014,第3期

机译：基于朴素贝叶斯和主题词的单音节语言优化文本摘要方法
3. APPLICATION OF NEURAL NETWORK ALGORITHMS AND NAIVE BAYES FOR TEXT CLASSIFICATION [J] . VADYM S. YAREMENKO, WALERY S. ROGOZA, VLADYSLAV I. SPITKOVSKYI Journal of Theoretical and Applied Information Technology . 2021,第1期

机译：神经网络算法应用于文本分类的神经网络算法
4. On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification [C] . Karl-Michael Schneider International Conference on Advances in Natural Language Processing . 2004

机译：关于幼稚贝叶斯文本分类的词汇信息与负证据
5. Modern Considerations for the Use of Naive Bayes in the Supervised Classification of Genetic Sequence Data [D] . Lakin, Steven M. 2021

机译：在遗传序列数据监督分类中使用Naive Bayes的现代考虑因素
6. Effects of Aging Word Frequency and Text Stimulus Quality on Reading Across the Adult Lifespan: Evidence From Eye Movements [O] . Kayleigh L. Warrington, Victoria A. McGowan, Kevin B. Paterson, -1

机译：衰老单词频率和文本刺激质量对成人寿命内阅读的影响：来自眼动的证据
7. On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification [O] . Karl-Michael Schneider 2004

机译：朴素贝叶斯文本分类中的词频信息和否定证据

On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅