A Word Representation Method Based on Glyph of Chinese Character

机译：基于汉字字形的单词表示方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The quality of words representation has an important impact on natural language processing tasks. Aiming at the problems in the current Chinese word representation method: the training data set is huge, the model quality depends on the data set, and the model stability is poor, a word representation method based on the glyph of Chinese character, Glyph2Vec, is proposed. Taking full advantage of the semantic information contained in Chinese characters, a glyph auto-encoder is constructed based on a convolutional auto-encoder. The glyph auto-encoder is used to obtain Chinese character embedding by mapping the glyph of Chinese character in the potential low-dimensional semantic space. In the Chinese named entity recognition task experiment, Glyph2Vec improves the accuracy to F1 score by 0.77%, 1.84%, and 1.31% respectively, compared with Word2Vec. The experimental results show that the method proposed is better than the existing results, which proves the effectiveness of this method.

机译：单词表示的质量对自然语言处理任务具有重要影响。针对当前汉字表示方法存在的问题：训练数据集庞大，模型质量取决于数据集，模型稳定性差，基于汉字字形Glyph2Vec的字表示方法是建议的。充分利用汉字中包含的语义信息，基于卷积自动编码器构造了字形自动编码器。字形自动编码器用于通过在潜在的低维语义空间中映射汉字字形来获得汉字嵌入。在中文命名实体识别任务实验中，与Word2Vec相比，Glyph2Vec将F1分数的准确性分别提高了0.77％，1.84％和1.31％。实验结果表明，该方法优于现有方法，证明了该方法的有效性。

著录项

来源
《International Conference on Intelligent Transportation, Big Data and Smart City》|2020年|954-957|共4页
会议地点
作者
Shancheng Tang; Puyue Zhang; Xiongxiong Chen; Hanbo Wang; Ming Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Chinese character; Word embedding; auto-encoder;

机译：汉字词嵌入自动编码器;

相似文献

外文文献
中文文献
专利

1. Visual word density-based nonlinear shape normalization method for handwritten Chinese character recognition [J] . Yunxue Shao, Chunheng Wang, Baihua Xiao International Journal on Document Analysis and Recognition (IJDAR) . 2013,第4期

机译：基于视觉词密度的非线性汉字形状归一化方法
2. Visual word density-based nonlinear shape normalization method for handwritten Chinese character recognition [J] . Yunxue Shao, Chunheng Wang, Baihua Xiao International Journal on Document Analysis and Recognition . 2013,第4期

机译：基于视觉词密度的非线性汉字形状归一化方法
3. Empirical Exploring Word-Character Relationship for Chinese Sentence Representation [J] . Wang Shaonan, Zhang Jiajun, Zong Chengqing ACM transactions on Asian language information processing . 2018,第3期

机译：汉语句子表征的实证探索
4. Learning Chinese Word Representations From Glyphs Of Characters [C] . Tzu-Ray Su, Hung-Yi Lee Conference on empirical methods in natural language processing . 2017

机译：从字符字形中学习中文单词表示
5. Animation, simulation, and control of soft characters using layered representations and simplified physics-based methods. [D] . Galoppo, Nico. 2008

机译：使用分层表示和简化的基于物理的方法对软字符进行动画制作，模拟和控制。
6. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation [O] . Phuoc Tran, Dien Dinh, Hien T. Nguyen 2016

机译：基于字符级和基于单词级的汉越机器翻译方法
7. Learning Chinese Word Representations From Glyphs Of Characters [O] . Su, Tzu-Ray, Lee, Hung-Yi 2017

机译：从字符雕文学习中文单词表示

A Word Representation Method Based on Glyph of Chinese Character

摘要

著录项

相似文献

相关主题

期刊订阅