首页> 外文期刊>ACM transactions on Asian language information processing >Empirical Exploring Word-Character Relationship for Chinese Sentence Representation
【24h】

Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

机译:汉语句子表征的实证探索

获取原文
获取原文并翻译 | 示例
       

摘要

This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.
机译:本文解决了学习组合汉语句子表示法的问题,即通过组合组成词的含义来表达句子的含义。与英文相反,中文单词由字符组成,这些字符包含丰富的语义信息。但是,现有方法尚未完全利用此信息。在这项工作中,我们介绍了一种新颖的混合字符-单词体系结构,以利用丰富的内部单词字符的语义信息来改进汉语句子的表示。我们提出了两种新颖的策略来达到这个目的。第一个是对字符使用遮罩门,学习单词中字符之间的关系。第二个是对单词使用最大池操作,以自适应地找到原子和组成单词表示形式的最佳混合。最后,将所提出的体系结构应用于各种句子组成模型,该模型在句子相似性任务上取得了优于基线模型的显着性能提升。为了进一步验证模型的泛化能力,我们将学习到的句子表示作为句子分类任务,问题分类任务和句子包含任务中的特征。结果表明,提出的混合字符-单词句子表示模型优于基于字符和基于单词的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号