...
首页> 外文期刊>MATEC Web of Conferences >Research on the internal influence factors of the text multi-classification problem
【24h】

Research on the internal influence factors of the text multi-classification problem

机译:文本多分类问题的内部影响因素研究

获取原文
           

摘要

This paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classification method used is important, but the internal factors sometimes play a great role, and even affect the success or failure of the whole text classification. In order to make up for this deficiency, this paper selects the Rocchio algorithm as the classification method, mainly from the category clustering density, class complexity, category definition, stop words and document’s length five internal factors, we tested their influences on text classification by the experiment. Experiment shows that the clustering density is higher and the complexity of the lower class, class definition is higher, the higher the accuracy of text classification, text classification effect is better, and better effect to text stop words, the length of the text does not directly affect the effect of text classification, but according to the text classification algorithm is more suitable to choose the length of the document.
机译:本文主要处理文本类型数据的分类。统计数据表明,通过光网络检索的各种文档中已达到8000多篇文章。但是,关于影响文本分类的因素的论文很少。所使用的文本分类方法很重要,但是内部因素有时起着很大的作用,甚至影响整个文本分类的成功或失败。为了弥补这一不足,本文选择Rocchio算法作为分类方法,主要从类别聚类密度,类别复杂度,类别定义,停用词和文档长度五个内部因素入手,通过以下方法检验了它们对文本分类的影响:本实验。实验表明,聚类密度越高,类的复杂度越低,类的定义越高,文本分类的准确性就越高,文本分类的效果越好,并且对文本停用词的效果也越好,文本的长度不会直接影响文本分类的效果,但是根据文本分类算法更适合选择文档的长度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号