Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components

机译：汉字，汉字和细粒度子字符组成的联合嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word embeddings have attracted much attention recently. Different from alphabetic writing systems, Chinese characters are often composed of subcharacter components which are also semantically informative. In this work, we propose an approach to jointly embed Chinese words as well as their characters and fine-grained subcharacter components. We use three likelihoods to evaluate whether the context words, characters, and components can predict the current taiget word, and collected 13,253 subcharacter components to demonstrate the existing approaches of decomposing Chinese characters are not enough. Evaluation on both word similarity and word analogy tasks demonstrates the superior performance of our model.

机译：词嵌入最近引起了很多关注。与字母书写系统不同，汉字通常由子字符组成，这些子字符在语义上也具有信息意义。在这项工作中，我们提出了一种共同嵌入中文单词及其字符和细粒度子字符组成部分的方法。我们使用三种可能性来评估上下文单词，字符和组成部分是否可以预测当前的taiget单词，并收集了13,253个子字符组成部分来证明现有的分解汉字方法还不够。对单词相似性和单词类比任务的评估证明了我们模型的优越性能。

著录项

来源
《Conference on empirical methods in natural language processing》|2017年|286-291|共6页
会议地点
作者
Jinxing Yu; Xun Jian; Hao Xin; Yangqiu Song;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 15:11:54

相似文献

外文文献
中文文献
专利

1. Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model [J] . Canasai KRUKNGKRA, Kiyotaka UCHIMOTO, Junichi KAZAMA, IEICE Transactions on Information and Systems . 2009,第12期

机译：使用错误驱动的字-字符混合模型的联合中文分词和POS标记
2. Learning Chinese word embeddings from character structural information [J] . Bing Ma, Qi Qi, Jianxin Liao, Computer speech and language . 2020,第Mara期

机译：从字符结构信息中学习中文单词嵌入
3. Out-domain Chinese new word detection with statistics-based character embedding [J] . Liang Yuzhi, Yang Min, Zhu Jia, Natural language engineering . 2019,第PTa2期

机译：基于统计字符嵌入的域外中文新词检测
4. Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components [C] . Jinxing Yu, Xun Jian, Hao Xin, Conference on empirical methods in natural language processing . 2017

机译：中文单词，字符和细粒度的嵌入嵌入的联合嵌入
5. The time course of semantic activation in reading Chinese two -character words [D] . Wong, Kin Fai Ellick 2000

机译：汉语两个汉字阅读中语义激活的时程
6. Functional Anatomy of Recognition of Chinese Multi-Character Words: Convergent Evidence from Effects of Transposable Nonwords Lexicality and Word Frequency [O] . Nan Lin, Xi Yu, Ying Zhao, -1

机译：汉语多字符词识别的功能解剖：可转位非词词汇性和词频影响的融合证据
7. A Hybrid Classification Method via Character Embedding in Chinese Short Text with Few Words [O] . Yi Zhu, Yun Li, Yongzheng Yue, 2020

机译：通过少数单词嵌入中文短文本的字符嵌入混合分类方法

Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components

摘要

著录项

相似文献

相关主题

期刊订阅