首页> 外文会议>Natural language understanding and intelligent applications >A Word Vector Representation Based Method for New Words Discovery in Massive Text
【24h】

A Word Vector Representation Based Method for New Words Discovery in Massive Text

机译:基于词向量表示的海量文本新词发现方法

获取原文
获取原文并翻译 | 示例

摘要

The discovery of new words is of great significance to natural language processing for the Chinese language. In recent years, training words in a corpus into a new word vector representation with neural network model has shown a good performance in representing the original semantic relationship among words. Accordingly, the word vector representation is then introduced into the discovery of new word in Chinese text. In this work, we propose a new unsupervised method for discovering new word based on n-gram method. To that end, we first trains the words in corpus into a word vector space, and then combine some elements in the corpus as candidates for new words. Finally, the noise candidates are dropped based on the similarity between two elements in the new word vector space. By comparing to some classical unsupervised methods such as mutual Information and adjacent entropy, the experiment results show that the propose method has great advantage on performance in discovering new words.
机译:新单词的发现对于汉语自然语言处理具有重要意义。近年来,利用神经网络模型将语料库中的单词训练成新的单词向量表示在表现单词之间的原始语义关系方面表现出良好的性能。因此,单词矢量表示然后被引入中文文本中的新单词的发现中。在这项工作中,我们提出了一种基于n-gram方法的新的无监督方法来发现新单词。为此,我们首先将语料库中的单词训练到单词向量空间中,然后将语料库中的某些元素组合为新单词的候选者。最后,基于新词向量空间中两个元素之间的相似性,丢弃候选噪声。通过与经典的无监督方法如互信息和相邻熵的比较,实验结果表明该方法在发现新词方面具有很大的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号