首页> 外文会议>Conference on empirical methods in natural language processing >Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components
【24h】

Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components

机译:汉字,汉字和细粒度子字符组成的联合嵌入

获取原文

摘要

Word embeddings have attracted much attention recently. Different from alphabetic writing systems, Chinese characters are often composed of subcharacter components which are also semantically informative. In this work, we propose an approach to jointly embed Chinese words as well as their characters and fine-grained subcharacter components. We use three likelihoods to evaluate whether the context words, characters, and components can predict the current taiget word, and collected 13,253 subcharacter components to demonstrate the existing approaches of decomposing Chinese characters are not enough. Evaluation on both word similarity and word analogy tasks demonstrates the superior performance of our model.
机译:词嵌入最近引起了很多关注。与字母书写系统不同,汉字通常由子字符组成,这些子字符在语义上也具有信息意义。在这项工作中,我们提出了一种共同嵌入中文单词及其字符和细粒度子字符组成部分的方法。我们使用三种可能性来评估上下文单词,字符和组成部分是否可以预测当前的taiget单词,并收集了13,253个子字符组成部分来证明现有的分解汉字方法还不够。对单词相似性和单词类比任务的评估证明了我们模型的优越性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号