首页> 外文期刊>Journal of supercomputing >Application of improved distributed naive Bayesian algorithms in text classification
【24h】

Application of improved distributed naive Bayesian algorithms in text classification

机译:改进的分布式朴素贝叶斯算法在文本分类中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

The naive Bayes classifier is a widely used text classification method that applies statistical theory to text classification. Due to the particularity of the text, related feature items may generate new semantic information, which may be lost when the traditional vector space model represents text. This paper mainly studies the construction and improvement of distributed naive Bayes automatic classification system. The application of Hadoop cloud computing in web page classification is one of the focuses of this article. Firstly, the text classification system and Bayesian classification model are analyzed and discussed, including the representation and extraction of text information, text classification methods and Bayesian text classification methods. Then, in view of the shortcomings of the above-mentioned naive Bayesian text classification method, when training text, we use the mutual information method to check the correlation between the feature sets generated after feature selection, and then combine the features with higher correlation degree appropriately. Through a series of tests, the experimental data show that the improved text classification system can achieve better classification results.
机译:朴素的贝叶斯分类器是一种广泛使用的文本分类方法,将统计理论应用于文本分类。由于文本的特殊性,相关的特征项可能会生成新的语义信息,当传统的矢量空间模型表示文本时,这些信息可能会丢失。本文主要研究分布式朴素贝叶斯自动分类系统的构建和改进。 Hadoop云计算在网页分类中的应用是本文的重点之一。首先,分析和讨论了文本分类系统和贝叶斯分类模型,包括文本信息的表示和提取,文本分类方法和贝叶斯文本分类方法。然后,针对上述朴素贝叶斯文本分类方法的不足,在训练文本时,我们使用互信息方法检查特征选择后生成的特征集之间的相关性,然后将相关度较高的特征进行组合。适当地。通过一系列测试,实验数据表明改进后的文本分类系统可以达到较好的分类效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号