Transfer Joint Embedding for Cross-Domain Named Entity Recognition

SINNO JIALIN PAN; ZHIQIANG TOH; JIAN SU

首页> 外文期刊>ACM Transactions on Information Systems >Transfer Joint Embedding for Cross-Domain Named Entity Recognition

【24h】

Transfer Joint Embedding for Cross-Domain Named Entity Recognition

机译：转移联合嵌入用于跨域命名实体识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Named Entity Recognition (NER) is a fundamental task in information extraction from unstructured text. Most previous machine-learning-based NER systems are domain-specific, which implies that they may only perform well on some specific domains (e.g., Newswire) but tend to adapt poorly to other related but different domains (e.g., Weblog). Recently, transfer learning techniques have been proposed to NER. However, most transfer learning approaches to NER are developed for binary classification, while NER is a multiclass classification problem in nature. Therefore, one has to first reduce the NER task to multiple binary classification tasks and solve them independently. In this article, we propose a new transfer learning method, named Transfer Joint Embedding (TJE), for cross-domain multiclass classification, which can fully exploit the relationships between classes (labels), and reduce domain difference in data distributions for transfer learning. More specifically, we aim to embed both labels (outputs) and high-dimensional features (inputs) from different domains (e.g., a source domain and a target domain) into a unified low-dimensional latent space, where 1) each label is represented by a prototype and the intrinsic relationships between labels can be measured by Euclidean distance; 2) the distance in data distributions between the source and target domains can be reduced; 3) the source domain labeled data are closer to their corresponding label-prototypes than others. After the latent space is learned, classification on the target domain data can be done with the simple nearest neighbor rule in the latent space. Furthermore, in order to scale up TJE, we propose an efficient algorithm based on stochastic gradient descent (SGD). Finally, we apply the proposed TJE method for NER across different domains on the ACE 2005 dataset, which is a benchmark in Natural Language Processing (NLP). Experimental results demonstrate the effectiveness of TJE and show that TJE can outperform state-of-the-art transfer learning approaches to NER.

机译：命名实体识别（NER）是从非结构化文本中提取信息的一项基本任务。以前的大多数基于机器学习的NER系统都是特定于域的，这意味着它们可能仅在某些特定域（例如Newswire）上表现良好，但往往难以适应其他相关但不同的域（例如Weblog）。最近，已经向NER提出了转移学习技术。但是，大多数针对NER的迁移学习方法都是针对二进制分类而开发的，而NER本质上是一个多类分类问题。因此，必须首先将NER任务简化为多个二进制分类任务，然后独立解决它们。在本文中，我们为跨域多类分类提出了一种新的转移学习方法，称为转移联合嵌入（TJE），它可以充分利用类（标签）之间的关系，并减少转移学习的数据分布中的域差异。更具体地说，我们旨在将来自不同域（例如，源域和目标域）的标签（输出）和高维特征（输入）嵌入到统一的低维潜在空间中，其中1）每个标签均表示通过原型，标签之间的内在联系可以通过欧几里得距离来测量； 2）可以减小源域和目标域之间数据分布的距离； 3）源域标记的数据比其他域更接近其相应的标记原型。在学习了潜在空间之后，可以使用潜在空间中的简单最近邻居规则对目标域数据进行分类。此外，为了扩大TJE，我们提出了一种基于随机梯度下降（SGD）的有效算法。最后，我们在ACE 2005数据集上为跨域的NER应用提出的TJE方法，这是自然语言处理（NLP）的基准。实验结果证明了TJE的有效性，并表明TJE可以胜过NER的最新转移学习方法。

著录项

来源
《ACM Transactions on Information Systems》 |2013年第2期|7.1-7.27|共27页
作者
SINNO JIALIN PAN; ZHIQIANG TOH; JIAN SU;
展开▼
作者单位

Data Analytics Department, Institute for Infocomm Research, 1 Fusionopolis Way, #21-01 Connexis, South Tower, Singapore 138632;

Data Analytics Department, Institute for Infocomm Research, 1 Fusionopolis Way, #21-01 Connexis, South Tower, Singapore 138632;

Data Analytics Department, Institute for Infocomm Research, 1 Fusionopolis Way, #21-01 Connexis, South Tower, Singapore 138632;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Named entity recognition; transfer learning; multiclass classification;

机译：命名实体识别;转移学习;多类别分类;

相似文献

外文文献
中文文献
专利

1. Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model [J] . Jingjing Xu, Hangfeng He, Xu Sun, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2018,第11期

机译：中国社交媒体中跨域和半监督的命名实体识别：统一模型
2. Domain-Transferable Method for Named Entity Recognition Task [J] . Vladislav Mikhailov, Tatiana Shavrina Computer Science & Information Technology . 2020,第14期

机译：命名实体识别任务的域可传输方法
3. Med-Flair: medical named entity recognition for diseases and medications based on Flair embedding [J] . Heba Gamal ElDin, Mustafa AbdulRazek, Muhammad Abdelshafi, Procedia Computer Science . 2021,第a期

机译：Med-Flair：基于Flair嵌入的疾病和药物的医疗名为实体识别
4. Joint Self-Attention and Multi-Embeddings for Chinese Named Entity Recognition [C] . Cijian Song, Yan Xiong, Wenchao Huang, International Conference on Big Data Computing and Communications . 2020

机译：汉字命名实体识别的联合自我注意和多嵌入
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for Named-Entity Recognition Using Embeddings from Language Models Representations [O] . Min Zhang, Guohua Geng, Jing Chen 2020

机译：使用语言模型表示的嵌入式识别命名实体识别的半监控双向短期内存和条件随机字段模型
7. Transfer Joint Embedding for Cross-Domain Named Entity Recognition [O] . Sinno Jialin Pan, Zhiqiang Toh, Jian Su 2014

机译：用于跨域命名实体识别的转移联合嵌入

Transfer Joint Embedding for Cross-Domain Named Entity Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅