...
首页> 外文期刊>Journal of computer sciences >Distributional Models with Syntactic Contexts for the Measurement of Word Similarity in Brazilian Portuguese
【24h】

Distributional Models with Syntactic Contexts for the Measurement of Word Similarity in Brazilian Portuguese

机译:具有句法背景的分布模型,用于测量巴西葡萄牙语中的单词相似性

获取原文
           

摘要

The similarity between words constitutes significant support to tasks in natural language processing. Several works use Lexical resources such as WordNet for semantic similarity and synonym identification. Nevertheless, words out-of-vocabulary or missing links between senses are perceived problems of this approach. Distributional-based proposals like word embeddings have successfully been used to meet such problems, but the lack of contextual information can prevent the achievement of even better results. The distributional models that include contextual information can bring advantages to this area, but these models are still scarcely explored. Therefore, this work studies the advantages of incorporating syntactic information in the distributional models, fostering for better results in semantic similarity approaches. For that purpose, the current work explore existing lexical and distributional techniques regarding the measurement of word similarity in Brazilian Portuguese. Experiments were carried out with the lexical database WordNet, using different techniques over a standard dataset. The results indicate that word embeddings can cover words out of vocabulary and have better results in comparison with lexical approaches. The main contribution of this article is a new approach to apply syntactic context in the training process of word embeddings to a Brazilian Portuguese corpus. The comparison of this model with the outcome of the previous experiments shows sound results and presents relevant complementary aspects.
机译:单词之间的相似性构成了对自然语言处理中任务的重要支持。几种作品使用词汇资源,例如Wordnet进行语义相似性和同义词标识。然而,感官之间的词汇或缺少词汇的词语是感知这种方法的问题。类似于Word Embeddings的分支的提案已成功地用于满足此类问题,但缺乏上下文信息可以防止实现更好的结果。包括上下文信息的分配模型可以为该区域带来优势,但这些模型仍然几乎没有探索。因此,这项工作研究了在分布模型中纳入句法信息的优点,促进了语义相似性方法的更好结果。为此目的,目前的工作探讨了关于巴西葡萄牙语中的单词相似性的现有词汇和分布技术。使用不同技术在标准数据集中使用不同技术进行实验。结果表明,单词嵌入物可以涵盖词汇中的单词,与词汇方法相比具有更好的结果。本文的主要贡献是在嵌入式嵌入过程中申请句法背景的新方法,以至于巴西葡萄牙语语料库。该模型与先前实验结果的比较显示了声音结果并提出了相关的互补方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号