首页> 外文会议>International Conference on Natural Language Processing >A Corpus Preprocessing Method for Syllable-Level Tibetan Text Classification
【24h】

A Corpus Preprocessing Method for Syllable-Level Tibetan Text Classification

机译:一种音节级藏文文本分类的语料库预处理方法

获取原文

摘要

Text classification is one of the most common and important tasks in the application field of natural language processing. With the rapid development of machine learning landscape, deep learning has become the mainstream approach for implementing text classification applications. However, deep learning has high requirements on the scale and quality of corpus, therefore, it is particularly important to build large-scale and high-quality corpus. In order to improve the quality of Tibetan text classification corpus, based on the analysis of the research status of corpus preprocessing, this paper proposes a syllable level Tibetan text classification corpus preprocessing model, and presents the core module of a text normalization algorithm which we refer as TC_ TCCNL. The proposed method lays a foundation for the construction of Tibetan text classification corpus.
机译:文本分类是自然语言处理应用领域中最常见、最重要的任务之一。随着机器学习领域的快速发展,深度学习已经成为实现文本分类应用的主流方法。然而,深度学习对语料库的规模和质量有很高的要求,因此,构建大规模、高质量的语料库尤为重要。为了提高藏文文本分类语料库的质量,本文在分析语料库预处理研究现状的基础上,提出了一种音节级藏文文本分类语料库预处理模型,并给出了一种文本规范化算法的核心模块TC_TCCNL。该方法为藏文文本分类语料库的构建奠定了基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号