...
首页> 外文期刊>ACM Transactions on Information Systems >Transfer Joint Embedding for Cross-Domain Named Entity Recognition
【24h】

Transfer Joint Embedding for Cross-Domain Named Entity Recognition

机译:转移联合嵌入用于跨域命名实体识别

获取原文
获取原文并翻译 | 示例

摘要

Named Entity Recognition (NER) is a fundamental task in information extraction from unstructured text. Most previous machine-learning-based NER systems are domain-specific, which implies that they may only perform well on some specific domains (e.g., Newswire) but tend to adapt poorly to other related but different domains (e.g., Weblog). Recently, transfer learning techniques have been proposed to NER. However, most transfer learning approaches to NER are developed for binary classification, while NER is a multiclass classification problem in nature. Therefore, one has to first reduce the NER task to multiple binary classification tasks and solve them independently. In this article, we propose a new transfer learning method, named Transfer Joint Embedding (TJE), for cross-domain multiclass classification, which can fully exploit the relationships between classes (labels), and reduce domain difference in data distributions for transfer learning. More specifically, we aim to embed both labels (outputs) and high-dimensional features (inputs) from different domains (e.g., a source domain and a target domain) into a unified low-dimensional latent space, where 1) each label is represented by a prototype and the intrinsic relationships between labels can be measured by Euclidean distance; 2) the distance in data distributions between the source and target domains can be reduced; 3) the source domain labeled data are closer to their corresponding label-prototypes than others. After the latent space is learned, classification on the target domain data can be done with the simple nearest neighbor rule in the latent space. Furthermore, in order to scale up TJE, we propose an efficient algorithm based on stochastic gradient descent (SGD). Finally, we apply the proposed TJE method for NER across different domains on the ACE 2005 dataset, which is a benchmark in Natural Language Processing (NLP). Experimental results demonstrate the effectiveness of TJE and show that TJE can outperform state-of-the-art transfer learning approaches to NER.
机译:命名实体识别(NER)是从非结构化文本中提取信息的一项基本任务。以前的大多数基于机器学习的NER系统都是特定于域的,这意味着它们可能仅在某些特定域(例如Newswire)上表现良好,但往往难以适应其他相关但不同的域(例如Weblog)。最近,已经向NER提出了转移学习技术。但是,大多数针对NER的迁移学习方法都是针对二进制分类而开发的,而NER本质上是一个多类分类问题。因此,必须首先将NER任务简化为多个二进制分类任务,然后独立解决它们。在本文中,我们为跨域多类分类提出了一种新的转移学习方法,称为转移联合嵌入(TJE),它可以充分利用类(标签)之间的关系,并减少转移学习的数据分布中的域差异。更具体地说,我们旨在将来自不同域(例如,源域和目标域)的标签(输出)和高维特征(输入)嵌入到统一的低维潜在空间中,其中1)每个标签均表示通过原型,标签之间的内在联系可以通过欧几里得距离来测量; 2)可以减小源域和目标域之间数据分布的距离; 3)源域标记的数据比其他域更接近其相应的标记原型。在学习了潜在空间之后,可以使用潜在空间中的简单最近邻居规则对目标域数据进行分类。此外,为了扩大TJE,我们提出了一种基于随机梯度下降(SGD)的有效算法。最后,我们在ACE 2005数据集上为跨域的NER应用提出的TJE方法,这是自然语言处理(NLP)的基准。实验结果证明了TJE的有效性,并表明TJE可以胜过NER的最新转移学习方法。

著录项

  • 来源
    《ACM Transactions on Information Systems》 |2013年第2期|7.1-7.27|共27页
  • 作者单位

    Data Analytics Department, Institute for Infocomm Research, 1 Fusionopolis Way, #21-01 Connexis, South Tower, Singapore 138632;

    Data Analytics Department, Institute for Infocomm Research, 1 Fusionopolis Way, #21-01 Connexis, South Tower, Singapore 138632;

    Data Analytics Department, Institute for Infocomm Research, 1 Fusionopolis Way, #21-01 Connexis, South Tower, Singapore 138632;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Named entity recognition; transfer learning; multiclass classification;

    机译:命名实体识别;转移学习;多类别分类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号