首页> 外文期刊>Pattern recognition letters >A high performance centroid-based classification approach for language identification
【24h】

A high performance centroid-based classification approach for language identification

机译:一种基于质心的高性能语言识别方法

获取原文
获取原文并翻译 | 示例

摘要

Centroid-based classification is a machine learning approach used in the text classification domain. The main advantage of centroid-based classifiers is their high performance during both the training stage and the classification stage. However, the success rate can be lower than the other classifiers if good centroid values are not used. In this paper, we apply the centroid-based classification method to the language identification problem, which can be considered as a sub-problem of text classification. We propose a novel method named as inverse class frequency to increase the quality of the centroid values, which involves an update of the classical values. We also use a feature set formed of individual characters rather than words or n-gram sequences to decrease the training and classification times. The experiments were performed on the ECI/MCI corpus and the method was compared with other methods and previous studies. The results showed that the proposed approach yields high success rates and works very efficiently for language identification.
机译:基于质心的分类是在文本分类领域中使用的一种机器学习方法。基于质心的分类器的主要优点是它们在训练阶段和分类阶段均表现出色。但是,如果未使用良好的质心值,则成功率可能会低于其他分类器。在本文中,我们将基于质心的分类方法应用于语言识别问题,该问题可被视为文本分类的子问题。我们提出了一种称为逆类频率的新方法,以提高质心值的质量,该方法涉及经典值的更新。我们还使用由单个字符而不是单词或n-gram序列形成的功能集来减少训练和分类时间。实验在ECI / MCI语料库上进行,并将该方法与其他方法和以前的研究进行了比较。结果表明,该方法具有较高的成功率,并且在识别语言方面非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号