首页> 外文会议>International Conference on Advanced Informatics: Concepts, Theory and Applications >Automatic multilabel classification for Indonesian news articles
【24h】

Automatic multilabel classification for Indonesian news articles

机译:印度尼西亚新闻文章的自动多标签分类

获取原文

摘要

Problem transformation and algorithm adaptation are the two main approaches in machine learning to solve multilabel classification problem. The purpose of this paper is to investigate both approaches in multilabel classification for Indonesian news articles. Since this classification deals with a large number of features, we also employ some feature selection methods to reduce feature dimension. There are four factors as the focuses of this paper, i.e., feature weighting method, feature selection method, multilabel classification approach, and single-label classification algorithm. These factors will be combined to determine the best combination. The experiments show that the best performer for multilabel classification of Indonesian news articles is the combination of TF-IDF feature weighting method, Symmetrical Uncertainty feature selection method, Calibrated Label Ranking - which belongs to problem transformation approach -, and SVM algorithm. This best combination achieves F-measure of 85.13% in 10-fold cross-validation, but the F-measure decreases to 76.73% in testing because of OOV.
机译:问题转换和算法适应是机器学习中解决多标签分类问题的两种主要方法。本文的目的是研究印尼新闻文章的多标签分类中的两种方法。由于此分类处理大量特征,因此我们还采用了一些特征选择方法来减小特征尺寸。有四个因素是本文的重点,即特征加权方法,特征选择方法,多标签分类方法和单标签分类算法。将这些因素组合在一起,以确定最佳组合。实验表明,印尼新闻文章多标签分类的最佳表现是TF-IDF特征加权方法,对称不确定特征选择方法,属于问题转换方法的校准标签排名和SVM算法的组合。这种最佳组合在10倍交叉验证中实现了85.13%的F值,但是由于OOV,测试中的F值降低到了76.73%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号