【24h】

Introduction

机译:介绍

获取原文

摘要

The growth in computational power and the rise of Deep Neural Networks (DNNs) have revolutionized the field of Natural Language Processing (NLP). The ability to collect massive datasets with the capacity to train big models on powerful CPUs, has yielded NLP-based technology that was beyond imagination only a few years ago. Unfortunately, this technology is still limited to a handful of resource rich languages and domains. This is because most NLP algorithms rely on the fundamental assumption that the training and the test sets are drawn from the same underlying distribution. When the train and test distributions do not match, a phenomenon known as domain shift, such models are likely to encounter performance drops. Despite the growing availability of heterogeneous data, many NLP domains still lack the amounts of labeled data required to feed data-hungry neural models, and in some domains and languages even unlabeled data is scarce. As a result, the problem of domain adaptation, training an algorithm on annotated data from one or more source domains, and applying it to other target domains, is a fundamental challenge that has to be solved in order to make NLP technology available for most world languages and textual domains.
机译:计算能力的增长和深度神经网络(DNN)的升高已经彻底改变了自然语言处理领域(NLP)。能够在强大的CPU上培训大型模型的能力收集大规模数据集,从几年前只出现了基于NLP的技术,超出了想象力。不幸的是,这项技术仍然仅限于少数资源丰富的语言和域。这是因为大多数NLP算法依赖于训练和测试集的基本假设从相同的底层分布中汲取。当火车和测试分布不匹配时,称为域移位的现象,这些模型可能会遇到性能下降。尽管异构数据的可用性不断增长,但许多NLP域仍然缺乏饲料数据饥饿的神经模型所需的标记数据的量,并且在一些域中和语言甚至未标记的数据是稀缺的。结果,域适应的问题,培训从一个或多个源域的注释数据训练算法,并将其应用于其他目标域,是必须解决的基本挑战,以便为大多数世界提供NLP技术语言和文本域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号