Word Embedding Evaluation in Downstream Tasks and Semantic Analogies

机译：在下游任务和语义类比中嵌入评估

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Language Models have long been a prolific area of study in the field of Natural Language Processing (NLP). One of the newer kinds of language models, and some of the most used, are Word Embeddings (WE). WE are vector space representations of a vocabulary learned by a non-supervised neural network based on the context in which words appear. WE have been widely used in downstream tasks in many areas of study in NLP. These areas usually use these vector models as a feature in the processing of textual data. This paper presents the evaluation of newly released WE models for the Portuguese language, trained with a corpus composed of 4.9 billion tokens. The first evaluation presented an intrinsic task in which WEs had to correctly build semantic and syntactic relations. The second evaluation presented an extrinsic task in which the WE models were used in two downstream tasks: Named Entity Recognition and Semantic Similarity between Sentences. Our results show that a diverse and comprehensive corpus can often outperform a larger, less textually diverse corpus, and that passing the text in parts to the WE generating algorithm may cause loss of quality.

机译：语言模型长期以来一直是自然语言处理领域的多产的研究领域（NLP）。其中一个较新的语言模型和一些最常用的语言模型（我们）是Word Embeddings（我们）。我们是由非监督神经网络基于出现单词的上下文学习的词汇表的矢量空间表示。我们已广泛应用于NLP的许多研究领域的下游任务。这些区域通常使用这些向量模型作为文本数据处理中的特征。本文介绍了对葡萄牙语的新发布模型的评估，培训了由49亿令牌组成的语料库。第一个评估介绍了一个内在的任务，其中WES必须正确构建语义和句法关系。第二个评估介绍了一个外在任务，其中我们模型用于两个下游任务：命名实体识别和句子之间的语义相似性。我们的研究结果表明，多样化和全面的语料库通常优于更大，较为娇小的不同语料库，并且将文本传递给我们生成算法的零件可能导致质量损失。

著录项

来源
《International Conference on Language Resources and Evaluation》|2020年|4828-4834|共7页
会议地点
作者
Joaquim Santos; Bernardo Consoli; Renata Vieira;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Language Models; Word Embeddings; Intrinsic Evaluation; Extrinsic Evaluation;

机译：语言模型;单词嵌入式;内在评估;外在评价;

相似文献

外文文献
中文文献
专利

1. Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases [J] . Zhiwei Chen, Zhe He, Xiuwen Liu, BMC Medical Informatics and Decision Making . 2018,第2期

机译：利用生物医学和通用领域知识库评估神经词嵌入中的语义关系
2. Word Embedding for Semantically Related Words: An Experimental Study [J] . Automatic Control and Computer Sciences . 2019,第7期

机译：嵌入语义相关词语的词：实验研究
3. Using Semantic Similarity with Word Embeddings for Arabic Multi-Words Term Extraction [J] . El-Khadir Lamrani, El Habib Ben Lahmer, Abdelaziz Marzak Journal of Engineering & Applied Sciences . 2018,第23期

机译：使用语义相似性与ArabiC多字词提取的Word Embeddings
4. Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks [C] . Nathan S. Hartmann, Erick R. Fonseca, Christopher D. Shulby, Brazilian Symposium in Information and Human Language Technology . 2017

机译：葡萄牙语单词嵌入：单词类比和自然语言任务的评估
5. Towards a new model of semantic processing: Task-specific effects of concreteness and semantic neighbourhood density in visual word recognition. [D] . Danguecan, Ashley N. 2016

机译：建立一种新的语义处理模型：视觉单词识别中具体性和语义邻域密度的特定任务效果。
6. Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases [O] . Zhiwei Chen, Zhe He, Xiuwen Liu, 2018

机译：利用生物医学和通用领域知识库评估神经词嵌入中的语义关系
7. Predicting and interpreting embeddings for out of vocabulary words in downstream tasks [O] . Nicolas Garneau, Jean-Samuel Leboeuf, Luc Lamontagne 2018

机译：预测和解释嵌入下游任务中的词汇单词的嵌入

Word Embedding Evaluation in Downstream Tasks and Semantic Analogies

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅