首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >LanideNN: Multilingual Language Identification on Character Window
【24h】

LanideNN: Multilingual Language Identification on Character Window

机译:LanideNN:字符窗口上的多语言语言识别

获取原文

摘要

In language identification, a common first step in natural language processing, we want to automatically determine the lan guage of some input text. Monolingual language identification assumes that the given document is written in one language. In multilingual language identification, the document is usually in two or three lan guages and we just want their names. We aim one step further and propose a method for textual language identification where languages can change arbitrarily and the goal is to identify the spans of each of the languages. Our method is based on Bidirectional Re current Neural Networks and it performs well in monolingual and multilingual lan guage identification tasks on six datasets covering 131 languages. The method keeps the accuracy also for short docu ments and across domains, so it is ideal for off-the-shelf use without preparation of training data.
机译:在语言识别中,这是自然语言处理中常见的第一步,我们希望自动确定某些输入文本的语言。单一语言的识别假定给定的文档是用一种语言编写的。在多语言识别中,文档通常使用两种或三种语言,我们只需要它们的名称。我们的目标进一步迈进了一步,提出了一种文本语言识别的方法,其中语言可以任意更改,目标是识别每种语言的跨度。我们的方法基于双向递归神经网络,在覆盖131种语言的六个数据集上,在单语和多语语言识别任务中表现良好。该方法还可以在短文档和跨域范围内保持准确性,因此非常适合在不准备训练数据的情况下用于现成的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号