首页> 外文会议>International Conference on Pattern Recognition >Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification
【24h】

Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification

机译:Moto:增强嵌入中国文本分类的多个联合因素

获取原文

摘要

Recently, language representation techniques have achieved great performances in text classification. However, most existing representation models are specifically designed for English materials, which may fail in Chinese because of the huge difference between these two languages. Actually, few existing methods for Chinese text classification process texts at a single level. However, as a special kind of hieroglyphics, radicals of Chinese characters are good semantic carriers. In addition, Pinyin codes carry the semantic of tones, and Wubi reflects the stroke structure information, etc. Unfortunately, previous researches neglected to find an effective way to distill the useful parts of these four factors and to fuse them. In our works, we propose a novel model called Moto: Enhancing Embedding with Multiple Joint Factors. Specifically, we design an attention mechanism to distill the useful parts by fusing the four-level information above more effectively. We conduct extensive experiments on four popular tasks. The empirical results show that our Moto achieves SOTA 0.8316 (F1-score, 2.11% improvement) on Chinese news titles, 96.38 (1.24% improvement) on Fudan Corpus and 0.9633 (3.26% improvement) on THUCNews.
机译:最近,语言代表技术在文本分类中取得了很大的表现。然而,大多数现有的代表模型专门用于英语材料,这可能因这两种语言之间的巨大差异而失败。实际上,只有少数中国文本分类过程文本的现有方法。然而,作为一种特殊的象形文字,汉字的激进是良好的语义载体。此外,拼音代码携带语义的音调,并且沃比反映了行程结构信息等。遗憾的是,以前的研究忽略了蒸馏这些四个因素的有用部分和熔化它们的有效方法。在我们的作品中,我们提出了一种名为MOTO的新型型号:增强了多种关节因素的嵌入。具体地,我们设计注意机制,通过融合更有效的四级信息来提取有用的部件。我们对四项流行任务进行了广泛的实验。经验结果表明,我们的摩托达到了SOTA 0.8316(F. 1 中国新闻标题的探测器,2.11%的改进),Fudan语料库上的96.38(1.24%的改进)和Thucnews的0.9633(改进3.26%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号