首页> 外文会议>International Conference on Machine Tool Technology and Mechatronics Engineering >Research and Implementation of Text Classification Algorithm
【24h】

Research and Implementation of Text Classification Algorithm

机译:文本分类算法的研究与实现

获取原文

摘要

The development of Internet and digital library has triggered a lot of text categorization methods. How to find desired information accurately and timely is becoming more and more important and automatic text categorization can help us achieve this goal. In general, text classifier is implemented by using some traditional classification methods such as Naive-Bayes (NB). ARCBC (Associative Rule-based Classifier by Category) can be used for text categorization by dividing text documents into subsets in which all documents belong to the same category and generate associative classification rules for each subset. This classifier differs from previous methods in that it consists of discovered association rules between words and categories extracted from the training set. In order to train and test this classifier, we constructed training data and testing data respectively by selecting documents from Yahoo. The experimental result shows that the performance of ARC-BC based text categorization is very pretty efficient and effective and it is comparable to Naive Bayesian algorithm based text categorization.
机译:Internet和Digital Library的开发已触发许多文本分类方法。如何准确找到所需的信息,并及时正变得越来越重要,自动文本分类可以帮助我们实现这一目标。通常,通过使用一些传统的分类方法(例如Naive-Bayes(NB))来实现文本分类器。 ARCBC(按类别基于规则的基于分类)通过将文本文档划分为子集来用于文本分类,其中所有文档都属于同一类别并为每个子集生成关联分类规则。该分类器与以前的方法不同,因为它包括从训练集中提取的单词和类别之间的发现关联规则。为了培训和测试此类分类器,我们分别通过从雅虎选择文档构建培训数据和测试数据。实验结果表明,基于ARC-BC的文本分类的性能非常非常高效,有效,它与基于朴素贝叶斯算法的文本分类相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号