首页> 外文会议>International Conference on Intelligent Transportation, Big Data and Smart City >A Word Representation Method Based on Glyph of Chinese Character
【24h】

A Word Representation Method Based on Glyph of Chinese Character

机译:基于汉字字形的单词表示方法

获取原文

摘要

The quality of words representation has an important impact on natural language processing tasks. Aiming at the problems in the current Chinese word representation method: the training data set is huge, the model quality depends on the data set, and the model stability is poor, a word representation method based on the glyph of Chinese character, Glyph2Vec, is proposed. Taking full advantage of the semantic information contained in Chinese characters, a glyph auto-encoder is constructed based on a convolutional auto-encoder. The glyph auto-encoder is used to obtain Chinese character embedding by mapping the glyph of Chinese character in the potential low-dimensional semantic space. In the Chinese named entity recognition task experiment, Glyph2Vec improves the accuracy to F1 score by 0.77%, 1.84%, and 1.31% respectively, compared with Word2Vec. The experimental results show that the method proposed is better than the existing results, which proves the effectiveness of this method.
机译:单词表示的质量对自然语言处理任务具有重要影响。针对当前汉字表示方法存在的问题:训练数据集庞大,模型质量取决于数据集,模型稳定性差,基于汉字字形Glyph2Vec的字表示方法是建议的。充分利用汉字中包含的语义信息,基于卷积自动编码器构造了字形自动编码器。字形自动编码器用于通过在潜在的低维语义空间中映射汉字字形来获得汉字嵌入。在中文命名实体识别任务实验中,与Word2Vec相比,Glyph2Vec将F1分数的准确性分别提高了0.77%,1.84%和1.31%。实验结果表明,该方法优于现有方法,证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号