首页> 外文会议>Eighth joint conference on lexical and computational semantics >Beyond Context: A New Perspective for Word Embeddings
【24h】

Beyond Context: A New Perspective for Word Embeddings

机译:语境之外:词嵌入的新视角

获取原文
获取原文并翻译 | 示例

摘要

Most word embeddings today are trained by optimizing a language modeling goal of scoring words in their context, modeled as a multi-class classification problem. Despite the successes of this assumption, it is incomplete: in addition to its context, orthographical or morphological aspects of words can offer clues about their meaning. In this paper, we define a new modeling framework for training word embeddings that captures this intuition. Our framework is based on the well-studied problem of multi-label classification and, consequently, exposes several design choices for leaturizing words and contexts, loss functions for training and score normalization. Indeed, standard models such as CBOW and fast-Text are specific choices along each of these axes. We show via experiments that by combining feature engineering with embedding learning, our method can outperform CBOW using only 10% of the training data in both the standard word embedding evaluations and also text classification experiments.
机译:如今,大多数单词嵌入都是通过优化在上下文中对单词评分的语言建模目标来训练的,该语言建模目标是多类分类问题。尽管此假设取得了成功,但它还是不完整的:除了其上下文之外,单词的字形或词法方面都可以提供有关其含义的线索。在本文中,我们定义了一个新的建模框架来训练单词嵌入,从而捕获了这种直觉。我们的框架基于经过充分研究的多标签分类问题,因此,提供了几种使单词和上下文趋于饱和的设计选择,针对训练的损失函数和分数归一化。确实,诸如CBOW和fast-Text之类的标准模型是沿着这些轴的特定选择。我们通过实验表明,将特征工程与嵌入学习相结合,在标准词嵌入评估和文本分类实验中,仅使用10%的训练数据,我们的方法就可以胜过CBOW。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号