Intrinsic or Extrinsic Evaluation: An Overview of Word Embedding Evaluation

机译：内部或外部评估：词嵌入评估概述

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Compared with traditional methods, word em-bedding is an efficient language representation that can learn syntax and semantics by using neural networks. As the result, more and more promising experiments in natural language processing (NLP) get the state-of-the-art results by introducing word embedding. In principle, embedding representation learning embeds words to a low-dimensional vector space, there-fore vectors support initialization of NLP tasks such as text classification, sentiment analysis, language understanding, etc. However, polysemy is very common in many languages, which causes word ambiguation, further influences the accuracy of the system. Additionally, language models based on distributed hypotheses mostly focused on word properties rather than morphology were our primary focus. This leads to unreasonable performance in different evaluations. At the same time, word embedding learning and measuring are two vital components of word representation. In this paper, we overviewed many language models including single sense and multiple sense word embedding, and many evaluated approaches including intrinsic and extrinsic evaluation. We found that there are obvious gaps between vectors and manual annotations in word similarity evaluation, and language models that achieved good performance in intrinsic evaluations could not produce similar results in extrinsic evaluations. To the best of our knowledge, there is no universal language model and embedding learning method for most NLP task, and each evaluations also hidden natural defects compared to human knowledge. More evaluated datasets are also investigated such as datasets used in intrinsic and extrinsic evaluations. We believe that an improved evaluation dataset and a more rational evaluation method would benefit from this overview.

机译：与传统方法相比，词嵌入是一种有效的语言表示形式，可以使用神经网络来学习语法和语义。结果，越来越多有前途的自然语言处理（NLP）实验通过引入词嵌入技术获得了最先进的结果。原则上，嵌入表示学习将单词嵌入到低维向量空间中，因此向量支持NLP任务的初始化，例如文本分类，情感分析，语言理解等。但是，多义性在许多语言中非常普遍，这导致单词歧义，进一步影响系统的准确性。此外，我们主要关注基于分布式假设的语言模型，这些语言模型主要关注词的属性而不是词法。这会导致在不同评估中出现不合理的性能。同时，词嵌入学习和度量是词表示的两个重要组成部分。在本文中，我们概述了许多语言模型，包括单义和多义词嵌入，以及许多评估方法，包括内在和外在评估。我们发现在单词相似性评估中向量和手动注释之间存在明显的差距，在固有评估中表现良好的语言模型在外部评估中无法产生相似的结果。据我们所知，大多数NLP任务都没有通用的语言模型和嵌入学习方法，并且与人类知识相比，每次评估也都隐藏了自然缺陷。还研究了更多评估的数据集，例如用于内部和外部评估的数据集。我们认为，改进的评估数据集和更合理的评估方法将从此概述中受益。

著录项

来源
《IEEE International Conference on Data Mining Workshops》|2018年|1255-1262|共8页
会议地点
作者
Yong Shi; Yuanchun Zheng; Kun Guo; Luyao Zhu; Yi Qu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Machine-to-machine communications; Conferences; Data mining;

机译：机器对机器通信;会议;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Lexicon Development for COVID-19-related Concepts Using Open-source Word Embedding Sources: An Intrinsic and Extrinsic Evaluation [J] . Soham Parikh, Anahita Davoudi, Shun Yu, JMIR Medical Informatics . 2021,第2期

机译：Covid-19相关概念的Lexicon开发使用开源词嵌入来源：内在和外在评估
2. Intrinsic and Extrinsic Automatic Evaluation Strategies for Paraphrase Generation Systems [J] . Tulu Tilahun Hailu, Junqing Yu, Tessfu Geteye Fantaye Journal of Computer and Communications . 2020,第2期

机译：释义生成系统的内在和外在的自动评估策略
3. Using measures of intrinsic homeostasis and extrinsic modulation to evaluate mental health in adolescents: Preliminary results from the longitudinal adolescent brain study (LABS) [J] . Beaudequin Denise, Schwenn Paul, McLoughlin Larisa T., Psychiatry research . 2020,第期

机译：利用内在稳态和外在调节的措施评估青少年心理健康：纵向青少年脑研究（实验室）初步结果
4. Intrinsic or Extrinsic Evaluation: An Overview of Word Embedding Evaluation [C] . Yong Shi, Yuanchun Zheng, Kun Guo, IEEE International Conference on Data Mining Workshops . 2018

机译：内在或外在评估：嵌入评估词的概述
5. Evaluating the relationship between hotel employees' extrinsic and intrinsic motivation in obtaining a cha certification to enhance professional development. [D] . Nalley, Michael E. 2014

机译：评估酒店员工的外在动机与内在动机之间的关系，以获得cha认证以促进专业发展。
6. Intrinsic and extrinsic drops in open-circuit voltage and conversion efficiency in solar cells with quantum dots embedded in host materials [O] . Lin Zhu, Hidefumi Akiyama, Yoshihiko Kanemitsu -1

机译：在主体材料中嵌入量子点的太阳能电池的开路电压和转换效率的内在和外在下降
7. Lexicon Development for COVID-19-related Concepts Using Open-source Word Embedding Sources: An Intrinsic and Extrinsic Evaluation (Preprint) [O] . Soham Parikh, Anahita Davoudi, Shun Yu, 2020

机译：Covid-19相关概念的Lexicon开发使用开源Word嵌入来源：内在和外在评估（预印）

Intrinsic or Extrinsic Evaluation: An Overview of Word Embedding Evaluation

摘要

著录项

相似文献

相关主题

期刊订阅