首页> 外文期刊>Journal of Theoretical and Applied Information Technology >ENSEMBLE MULTI-LABEL TEXT CATEGORIZATION BASED ON PYRAMIDAL CLUSTER MEMBERSHIP APPROACH
【24h】

ENSEMBLE MULTI-LABEL TEXT CATEGORIZATION BASED ON PYRAMIDAL CLUSTER MEMBERSHIP APPROACH

机译:基于金字塔聚类成员方法的可封装多标签文本分类

获取原文
           

摘要

Text Categorization is an interesting field in the study of Textual Data Mining. It has attracted an increasing popularity with its explosive growth of textual documents. The documents are connected with exclusive multitude categories i.e sports, medical, health, and Olympic Games). Text categorization paves different opportunities for creating multi-label learning approaches that specifically to textual data. Text mining defines the processes of discovering useful knowledge patterns from textual data. This is one of the factors followed in automated text categorization. It is practiced by developing novel machine learning approaches. Anyhow, the ML model generates low expressivity. The ML model established using Train-Test scenario. In case the existing model is found deficient, the Train-Test-Retrain is developed which is time consuming process. In this paper, we proposed ?Pyramidal Cluster Membership Approach (PCMO)?. It works in two models namely, training and testing model. The training model comprised of four phases, Pyramid-Fuzzy Transmutation, Novel k-edge classifier, Cluster to Category mapping and finding the boundaries. These estimated boundaries are applied on new textual data and the categories are assigned. Experimental results on Freebase dataset show that the proposed approach based on pyramidal membership method can achieve better classification accuracy than the traditional approaches especially that includes over-fitting document categories.
机译:文本分类是文本数据挖掘研究中一个有趣的领域。随着文本文件的爆炸性增长,它越来越受到人们的欢迎。这些文档与众多专有类别相关,例如体育,医疗,保健和奥运会。文本分类为创建专门针对文本数据的多标签学习方法提供了不同的机会。文本挖掘定义了从文本数据中发现有用的知识模式的过程。这是自动文本分类中遵循的因素之一。通过开发新颖的机器学习方法来实践它。无论如何,ML模型会产生低表现力。使用训练测试场景建立的ML模型。如果发现现有模型不足,则开发训练-测试-再训练,这是耗时的过程。在本文中,我们提出了“金字塔形聚类成员方法(PCMO)”。它以两种模型工作,即训练和测试模型。训练模型包括四个阶段,金字塔-模糊变换,新颖的k边缘分类器,聚类到类别映射和查找边界。这些估计的边界将应用于新的文本数据,并指定类别。在Freebase数据集上的实验结果表明,与传统方法相比,该方法基于金字塔隶属度方法可以实现更好的分类精度,特别是在包含过度拟合的文档类别时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号