首页> 外文会议>Beyond Vision and LANguage: inTEgrating Real-world kNowledge Conference >Eigencharacter: An Embedding of Chinese Character Orthography
【24h】

Eigencharacter: An Embedding of Chinese Character Orthography

机译:本征字符:汉字正字法的嵌入

获取原文

摘要

Chinese characters are unique in its lo-gographic nature, which inherently encodes world knowledge through thousands of years evolution. This paper proposes an embedding approach, namely eigencharacter (EC) space, which helps NLP application easily access the knowledge encoded in Chinese orthography. These EC representations are automatically extracted, encode both structural and radical information, and easily integrate with other computational models. We built EC representations of 5,000 Chinese characters, investigated orthography knowledge encoded in ECs, and demonstrated how these ECs identified visually similar characters with both structural and radical information.
机译:汉字具有独特的地理特征,它通过数千年的演变固有地对世界知识进行编码。本文提出了一种嵌入方法,即本征特征(EC)空间,该方法可帮助NLP应用程序轻松访问汉字正字法中编码的知识。这些EC表示将自动提取,对结构和基本信息进行编码,并轻松与其他计算模型集成。我们建立了5000个汉字的EC表示,研究了EC中编码的拼字法知识,并演示了这些EC如何通过结构和基本信息识别出视觉上相似的字符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号