首页> 外文会议>International Conference on Natural Language Processing >A Corpus Preprocessing Method for Syllable-Level Tibetan Text Classification

【24h】

A Corpus Preprocessing Method for Syllable-Level Tibetan Text Classification

机译：一种音节级藏文文本分类的语料库预处理方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification is one of the most common and important tasks in the application field of natural language processing. With the rapid development of machine learning landscape, deep learning has become the mainstream approach for implementing text classification applications. However, deep learning has high requirements on the scale and quality of corpus, therefore, it is particularly important to build large-scale and high-quality corpus. In order to improve the quality of Tibetan text classification corpus, based on the analysis of the research status of corpus preprocessing, this paper proposes a syllable level Tibetan text classification corpus preprocessing model, and presents the core module of a text normalization algorithm which we refer as TC_ TCCNL. The proposed method lays a foundation for the construction of Tibetan text classification corpus.

机译：文本分类是自然语言处理应用领域中最常见、最重要的任务之一。随着机器学习领域的快速发展，深度学习已经成为实现文本分类应用的主流方法。然而，深度学习对语料库的规模和质量有很高的要求，因此，构建大规模、高质量的语料库尤为重要。为了提高藏文文本分类语料库的质量，本文在分析语料库预处理研究现状的基础上，提出了一种音节级藏文文本分类语料库预处理模型，并给出了一种文本规范化算法的核心模块TC_TCCNL。该方法为藏文文本分类语料库的构建奠定了基础。

著录项

来源
《International Conference on Natural Language Processing 》|2021年|33-36|共4页
会议地点
作者
Dao Ji-Zhaxi; Cai Zhi-Jie; Cai Rang-Zhuoma; San Maocuo; Ban Mabao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Analytical models; Text categorization; Natural language processing; Classification algorithms; Task analysis;

机译：深度学习;分析模型;文本分类;自然语言处理;分类算法;任务分析;

相似文献

外文文献
中文文献
专利

1. WordNet-based lexical semantic classification for text corpus analysis [J] . LONG Jun, WANG Lu-da, LI Zu-de, 中南大学学报（英文版） . 2015 ,第005期
2. Text GCN-SW-KNN:a novel collaborative training multi-label classification method for WMS application themes by considering geographic semantics [J] . Zhengyang Wei, Zhipeng Gui, Min Zhang, 地球大数据（英文版） . 2021 ,第001期
3. A Short Text Classification Method Based on N-Gram and CNN [J] . WANG Haitao, HE Jie, ZHANG Xiaohong, 电子学报（英文版） . 2020 ,第002期
4. A method of constructing syllable level Tibetan text classification corpus [J] . Jizhaxi Dao, Zhijie Cai, Rangzhuoma Cai, MATEC Web of Conferences . 2021 ,第a期

机译：一种构建音节级藏文本分类语料库的方法
5. THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY [J] . ?or?e Petrovi?, Milena Stankovi? Facta Universitatis. Series Mathematics and Informatics . 2019 ,第5期

机译：文本预处理方法和工具对计算文本相似性的影响
6. Multi-text classification of Urdu/Roman using machine learning and natural language preprocessing techniques [J] . M Ameen Chhajro, Mansoor Ahmed Khuhro, Kamlesh Kumar, Indian Journal of Science and Technology . 2020 ,第19期

机译：Urdu / Roman使用机器学习和自然语言预处理技术的多文本分类
7. Building Large Scale Text Corpus for Tibetan Natural Language Processing by Extracting Text from Web Pages [C] . Huidan LIU, Minghua NUO, Jian WU, 10th workshop on Asian language resources . 2012

机译：通过从网页提取文本来构建用于藏语自然语言处理的大规模文本语料库
8. Linguistic indicators for language understanding: Using machine learning methods to combine corpus-based indicators for aspectual classification of clauses. [D] . Siegel, Eric Victor. 1998

机译：用于语言理解的语言指标：使用机器学习方法结合基于语料库的指标来对从句进行方面分类。
9. The influence of preprocessing on text classification using a bag-of-words representation [O] . Yaakov HaCohen-Kerner, Daniel Miller, Yair Yigal, 2020

机译：使用袋式表示预处理预处理对文本分类的影响
10. CLASSIFICATIONAL PARADIGM OF A TEXT CORPUS BY ITS DESIGN, STRUCTURE AND USE, AS WELL AS BY THE FIXATION AND INDEXATION METHODS OF ITS TEXT DATA [O] . Lesia Kotsiuk, Yurii Kotsiuk 2020

机译：文本语料库的分类范式通过其设计，结构和使用，以及其文本数据的固定和分度方法
11. Corpus and Method for Identifying Citations in Non-Academic Text (Open Access, Publisher's Version). [R] . He, Y., Meyers, A. 2014

机译：在非学术文本中识别引文的语料库和方法（开放存取，出版商版）。

A Corpus Preprocessing Method for Syllable-Level Tibetan Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅