首页> 外文期刊>Computers, Materials & Continua >Improving Chinese Word Representation with Conceptual Semantics
【24h】

Improving Chinese Word Representation with Conceptual Semantics

机译:用概念语义改善汉语词典

获取原文
获取原文并翻译 | 示例
           

摘要

The meaning of a word includes a conceptual meaning and a distributive meaning. Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity, especially for low-frequency words. In knowledge bases, manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted. In this paper, we propose a Conceptual Semantics Enhanced Word Representation (CEWR) model, computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus, and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words. We evaluate the CEWR model on two tasks: word similarity computation and short text classification. The Spearman correlation between model results and human judgement are improved to 64.71%, 81.84%, and 85.16% on Wordsim297, MC30, and RG65, respectively. Moreover, CEWR improves the F1 score by 3% in the short text classification task. The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding. This proves that conceptual semantics, especially hypernymous information, is a good complement to distributed word representation.
机译:单词的含义包括概念含义和分配含义。基于分布的单词嵌入遭受由数据稀疏性引起的概念性语义表示不足,特别是对于低频词。在知识库中,手动注释的语义知识是稳定的,单词的基本属性是准确的表示。在本文中,我们提出了一个概念语义增强的单词表示(CEWR)模型,基于Tongyici Cilin词库计算汉字的SYNSED嵌入和Hypernym嵌入,并将其与分布式字表示汇总,以具有分布式信息和概念意义编码在单词的表示中。我们在两个任务中评估CEWR模型:字相似性计算和短文本分类。模型结果与人工判断之间的矛盾分别在WordsIM297,MC30和RG65上提高到64.71%,81.84%和85.16%。此外,在短文本分类任务中,CEWR将F1分数提高3%。实验结果表明,CEWR可以代表比分布式单词嵌入更具信息丰富的方法。这证明了概念语义,尤其是贫微的信息,是分布词表示的良好补充。

著录项

  • 来源
    《Computers, Materials & Continua》 |2020年第3期|1897-1913|共17页
  • 作者单位

    International College for Chinese Studies Nanjing Normal University Nanjing 210097 China School of Chinese Language and Literature Nanjing Normal University Nanjing 210097 China;

    School of Chinese Language and Literature Nanjing Normal University Nanjing 210097 China School of Computer Science and Technology Nanjing Normal University Nanjing 210023 China;

    School of Computer Science and Technology Nanjing Normal University Nanjing 210023 China;

    School of Computer Science and Electronic Engineering University of Essex Essex CO4 3DQ UK;

    School of Computer Science and Technology Nanjing Normal University Nanjing 210023 China;

    School of Computer Science and Technology Nanjing Normal University Nanjing 210023 China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Word representation; conceptual semantics; hypernymy; similarity computation; short text classification;

    机译:字表示;概念语义;赘瘤;相似性计算;短文本分类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号