Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification

机译：Moto：增强嵌入中国文本分类的多个联合因素

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, language representation techniques have achieved great performances in text classification. However, most existing representation models are specifically designed for English materials, which may fail in Chinese because of the huge difference between these two languages. Actually, few existing methods for Chinese text classification process texts at a single level. However, as a special kind of hieroglyphics, radicals of Chinese characters are good semantic carriers. In addition, Pinyin codes carry the semantic of tones, and Wubi reflects the stroke structure information, etc. Unfortunately, previous researches neglected to find an effective way to distill the useful parts of these four factors and to fuse them. In our works, we propose a novel model called Moto: Enhancing Embedding with Multiple Joint Factors. Specifically, we design an attention mechanism to distill the useful parts by fusing the four-level information above more effectively. We conduct extensive experiments on four popular tasks. The empirical results show that our Moto achieves SOTA 0.8316 (F1-score, 2.11% improvement) on Chinese news titles, 96.38 (1.24% improvement) on Fudan Corpus and 0.9633 (3.26% improvement) on THUCNews.

机译：最近，语言代表技术在文本分类中取得了很大的表现。然而，大多数现有的代表模型专门用于英语材料，这可能因这两种语言之间的巨大差异而失败。实际上，只有少数中国文本分类过程文本的现有方法。然而，作为一种特殊的象形文字，汉字的激进是良好的语义载体。此外，拼音代码携带语义的音调，并且沃比反映了行程结构信息等。遗憾的是，以前的研究忽略了蒸馏这些四个因素的有用部分和熔化它们的有效方法。在我们的作品中，我们提出了一种名为MOTO的新型型号：增强了多种关节因素的嵌入。具体地，我们设计注意机制，通过融合更有效的四级信息来提取有用的部件。我们对四项流行任务进行了广泛的实验。经验结果表明，我们的摩托达到了SOTA 0.8316（F. 1 中国新闻标题的探测器，2.11％的改进），Fudan语料库上的96.38（1.24％的改进）和Thucnews的0.9633（改进3.26％）。

著录项

来源
《International Conference on Pattern Recognition》|2021年|2882-2888|共7页
会议地点
作者
Xunzhu Tang; Rujie Zhu; Tiezhu Sun; Shi Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Fuses; Text categorization; Semantics; Pattern recognition; Task analysis;

机译：保险丝;文本分类;语义;模式识别;任务分析;

相似文献

外文文献
中文文献
专利

1. Knowledge-enhanced document embeddings for text classification [J] . Roberta A. Sinoara, Jose Camacho-Collados, Rafael G. Rossi, Knowledge-Based Systems . 2019,第JANa1期

机译：用于文本分类的知识增强型文档嵌入
2. Grammar guided embedding based Chinese long text sentiment classification [J] . Zhang Chaoli, Lin Dazhen, Cao Donglin, Concurrency and computation: practice and experience . 2021,第21期

机译：基于语法的嵌入式嵌入式中国长文本情绪分类
3. Chinese text classification by the Naive Bayes Classifier and the associative classifier with multiple confidence threshold values [J] . Shing-Hwa Lu, Ding-An Chiang, Huan-Chao Keh, Knowledge-Based Systems . 2010,第6期

机译：通过朴素贝叶斯分类器和具有多个置信度阈值的关联分类器对中文文本进行分类
4. Dynamically Jointing character and word embedding for Chinese text Classification [C] . Xuetao Tang, Xuegang Hu, Peipei Li IEEE International Conference on Knowledge Graph . 2020

机译：动态拼接字符和单词嵌入技术在中文文本分类中的应用
5. Resistance a la fatigue d'un joint modulaire d'expansion pour ponts a multiples barres de support soudees (French text). [D] . Sieprawski, Guillaume. 2004

机译：带有多个焊接支撑杆的桥梁的模块化伸缩缝的疲劳强度（法文）。
6. An Enhanced Joint Hilbert Embedding-Based Metric to Support Mocap Data Classification with Preserved Interpretability [O] . Cristian Kaori Valencia-Marin, Juan Diego Pulgarin-Giraldo, Luisa Fernanda Velasquez-Martinez, 2021

机译：基于嵌入的基于Hilbert嵌入的度量标准以支持具有保存的解释性的Mocap数据分类
7. Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation [O] . Zhuoren Jiang, Zhe Gao, Guoxiu He, 2019

机译：通过石头划分：图形和文本关节嵌入伪装的垃圾邮件内容：用于汉字变异表示

Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅