首页> 外文期刊>Expert systems with applications >TransPhrase: A new method for generating phrase embedding from word embedding in Chinese
【24h】

TransPhrase: A new method for generating phrase embedding from word embedding in Chinese

机译:transphrass:一种新方法,用于嵌入中文词汇嵌入词组

获取原文
获取原文并翻译 | 示例

摘要

Currently, there are two main methods of learning phrase embedding: the distribution method and the composition method. The distribution method treats a phrase as an entirety and learns phrase embedding based on the context of the phrase. Its disadvantage is that it completely ignores the semantics of the component words of the phrase and the data sparseness problem. The composition method calculates the phrase embedding from the embedding of component words. The existing composition methods fail to represent the semantics of phrases well. Because of the above problems, we take Chinese, for example, and propose a new composition method to generate phrase embedding from the component word embedding, named TransPhrase. It is a neural network that can use LSTM to learn the order information of component words, use the attention mechanism to learn the important information of component words, and use a fully connected network to learn the semantic information of component words, and finally predict phrase embedding. It can solve the data sparseness problem and properly and fully represent the semantics of phrases. Our evaluation of three Chinese phrase-level semantic tasks shows that the comprehensive performance of TransPhrase's phrase representation is better than the composition method, the distribution method, and the pre-trained language model.
机译:目前,有两种主要学习短语嵌入方法:分布方法和组合方法。分发方法将短语视为整体,并根据短语的上下文学习短语嵌入。其缺点是它完全忽略了短语和数据稀疏问题的组件词的语义。组合方法计算嵌入组件单词的短语。现有的组合方法无法代表短语的语义。由于上述问题,例如,我们拍摄中文,并提出了一种新的组合方法,以从嵌入的组件词嵌入命名的Transphrase来生成嵌入的短语。它是一个神经网络,可以使用LSTM学习组件单词的订单信息,使用注意机制学习组件单词的重要信息,并使用完全连接的网络来学习组件单词的语义信息,最后预测短语嵌入。它可以解决数据稀疏问题,并正确地表示短语的语义。我们对三个中文级语义任务的评估表明,Transphrrase的短语表示的综合性能优于组合方法,分配方法和预先培训的语言模型。

著录项

  • 来源
    《Expert systems with applications》 |2021年第4期|114387.1-114387.9|共9页
  • 作者单位

    Harbin Engn Univ Coll Comp Sci & Technol Harbin 150001 Peoples R China|Big Data Applicat Improving Govt Governance Capab Guiyang 550022 Peoples R China|CETC Big Data Res Inst Co Ltd Guiyang 550022 Peoples R China;

    Harbin Engn Univ Coll Comp Sci & Technol Harbin 150001 Peoples R China;

    Harbin Engn Univ Coll Comp Sci & Technol Harbin 150001 Peoples R China;

    Big Data Applicat Improving Govt Governance Capab Guiyang 550022 Peoples R China|CETC Big Data Res Inst Co Ltd Guiyang 550022 Peoples R China;

    Harbin Engn Univ Coll Comp Sci & Technol Harbin 150001 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Phrase embedding; Word embedding; Composition method; Neural network;

    机译:短语嵌入;词嵌入;组合方法;神经网络;
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号