Toward Multilingual Identification of Online Registers

机译：在线寄存器的多语言识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider cross- and multilingual text classification approaches to the identification of online registers (genres), i.e. text varieties with specific situational characteristics. Register is arguably the most important predictor of linguistic variation, and register information could improve the potential of online data for many applications. We introduce the Finnish Corpus of Online REgisters (FinCORE), the first manually annotated non-English corpus of online registers featuring the full range of linguistic variation found online. The data set consists of 2,237 Finnish documents and follows the register taxonomy developed for the Corpus of Online Registers of English (CORE), the largest manually annotated language collection of online registers. Using CORE and FinCORE data, we demonstrate the feasibility of cross-lingual register identification using a simple approach based on convo-lutional neural networks and multilingual word embeddings. We further find that register identification results can be improved through multilingual training even when a substantial number of annotations is available in the target language.

机译：我们考虑识别网上寄存器（流体）的交叉和多语言文本分类方法，即具有特定情境特征的文本品种。寄存器可以说是语言变异最重要的预测因子，并且寄存器信息可以改善许多应用程序的在线数据的潜力。我们介绍了在线寄存器（FINCORE）的芬兰语法，这是在线发现的全部语言寄存器的第一批手动注释的非英语语料库。数据集由2,237名芬兰文档组成，并遵循寄存器分类，为英语（核心）的在线寄存器语料库，是在线登记的最大手动注释的语言集合。使用核心和FINCORE数据，我们使用基于Convo-Lutional神经网络和多语言单词嵌入的简单方法来展示交叉定语识别识别的可行性。我们进一步发现，即使目标语言中有大量的注释，也可以通过多语言培训来提高寄存器识别结果。

著录项

来源
《Nordic conference of computational Linguistics》|2019年|xx 410 p.|共6页
会议地点
作者
Veronika Laippala; Roosa Kyll?nen; Jesse Egbert; Douglas Biber; Sampo Pyysalo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 20:19:21

相似文献

外文文献
中文文献
专利

1. Deep bidirectional long short-term memory for online multilingual writer identification based on an extended Beta-elliptic model and fuzzy elementary perceptual codes [J] . Dhieb Thameur, Boubaker Houcine, Ouarda Wael, Multimedia Tools and Applications . 2021,第9期

机译：基于扩展β-椭圆模型的在线多语言作者识别深度双向短期内存和模糊基本的感知码
2. Influence of social conversational features on language identification in highly multilingual online conversations [J] . Neelakshi Sarma, Sanasam Ranbir Singh, Diganta Goswami Information Processing & Management . 2019,第1期

机译：社交对话功能对高度多语言在线对话中语言识别的影响
3. An unsupervised multilingual approach for online social media topic identification [J] . Lo Siaw Ling, Chiong Raymond, Cornforth David Expert Systems with Application . 2017,第SEPa期

机译：在线社交媒体主题识别的无监督多语言方法
4. Toward Multilingual Identification of Online Registers [C] . Veronika Laippala, Roosa Kyllönen, Jesse Egbert, Nordic conference of computational Linguistics . 2019

机译：寻求在线注册的多语言识别
5. Who seeks comfort online? A descriptive study of adults with chronic pain who register for an online support group as part of their own self-care. [D] . Colon, Yvette. 2007

机译：谁在网上寻求安慰？对患有慢性疼痛的成年人进行描述性研究，他们将自己注册为在线支持小组，作为他们自己的自我保健的一部分。
6. Attitudes Toward Multilingualism in Luxembourg. A Comparative Analysis of Online News Comments and Crowdsourced Questionnaire Data [O] . Christoph Purschke 2020

机译：卢森堡的多语言态度态度。在线新闻评论和众包问卷数据的比较分析
7. Word Level Language Identification in Online Multilingual Communication [O] . Dong Nguyen A. Seza Do˘gruöz 2014

机译：在线多语言交际中的词级语言识别

Toward Multilingual Identification of Online Registers

摘要

著录项

相似文献

相关主题

期刊订阅