首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine
【24h】

Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine

机译:支持向量机使用贝叶斯公式对文本文档进行预处理以进行分类

获取原文
获取原文并翻译 | 示例

摘要

This work implements an enhanced hybrid classification method through the utilization of the naïve Bayes classifier and the Support Vector Machine (SVM). In this project, the Bayes formula was used to vectorize (as opposed to classify) a document according to a probability distribution reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities to which the document can be assigned according to a pre determined set of topics such as those found in the "20 newsgroups" dataset for instance. Using this probability distribution as the vectors to represent the document, the SVM can then be used to classify the documents on a multi – dimensional level. The effects of an inadvertent dimensionality reduction caused by classifying using only the highest probability using the naïve Bayes classifier can be overcome using the SVM by employing all the probability values associated with every category for each document. This method can be used for any dataset and shows a significant reduction in training time as compared to the LSquare method and significant improvement in classification accuracy when compared to pure naïve Bayes systems and also the TF-IDF/SVM hybrids.
机译:这项工作通过利用朴素的贝叶斯分类器和支持向量机(SVM)实现了一种增强的混合分类方法。在该项目中,贝叶斯公式用于根据反映文档可能属于的类别的概率分布对文档进行矢量化(而不是对文档进行分类)。贝叶斯公式给出了可以根据预定主题集(例如在“ 20个新闻组”数据集中找到的主题)将文档分配给其的概率范围。使用这种概率分布作为表示文档的向量,然后可以使用SVM在多维级别上对文档进行分类。使用SVM可以通过使用与每个文档的每个类别相关的所有概率值来克服仅使用朴素贝叶斯分类器仅使用最高概率进行分类而导致的因疏忽而导致的降维效果。与LSquare方法相比,该方法可用于任何数据集,并且与纯朴素贝叶斯系统以及TF-IDF / SVM混合系统相比,显示出训练时间显着减少,分类准确性显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号