A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts

机译：多语言噪声文本中语言义语言模型的比较和盗版语言识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper seeks to examine the effect of including background knowledge in the form of character pre-trained neural language model (LM), and data bootstrapping to overcome the problem of unbalanced limited resources. As a test, we explore the task of language identification in mixed-language short non-edited texts with an under-resourced language, namely the case of Algerian Arabic for which both labelled and unlabelled data are limited. We compare the performance of two traditional machine learning methods and a deep neural networks (DNNs) model. The results show that overall DNNs perform better on labelled data for the majority categories and struggle with the minority ones. While the effect of the untokenised and unlabelled data encoded as LM differs for each category, bootstrapping, however, improves the performance of all systems and all categories. These methods are language independent and could be generalised to other under-resourced languages for which a small labelled data and a larger unlabelled data are available.

机译：本文探讨了以特征预先训练的神经语言模型（LM）形式的背景知识包括背景知识，以及克服有限资源不平衡问题的数据引导。作为测试，我们探讨了用资源不足的语言探讨了语言识别语言识别的任务，即标记和未标记数据的阿尔及利亚阿拉伯语的情况有限。我们比较两个传统机器学习方法的性能和深度神经网络（DNN）模型。结果表明，整体DNN对大多数类别的标记数据表现更好，并与少数群体斗争。虽然被编码为LM的未驾驶和未标记的数据的效果对于每个类别而异，但是，引导映射可提高所有系统和所有类别的性能。这些方法是语言独立的，可以推广到其他欠资源的欠资源语言，其中有一个小标记数据和更大的未标记数据。

著录项

来源
《Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies》|2018年|x 77 p.|共10页
会议地点
作者
Wafia Adouane; Simon Dobnik; Jean-Philippe Bernardy; Nasredine Semmar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. An Experimental Comparison of Modeling Techniques and Combination of Speaker - Specific Information from Different Languages for Multilingual Speaker Identification [J] . H. S. Jayanna, B. G. Nagaraja Journal of Intelligent Systems . 2016,第4期

机译：多种语言的说话人识别的建模技术和来自不同语言的说话人特定信息组合的实验比较
2. Multilingual Text-to-Speech Software Component for Dynamic Language Identification and Voice Switching [J] . Fogarassy-Neszly Paul, Pribeanu Costin Studies in Informatics and Control . 2016,第3期

机译：用于动态语言识别和语音切换的多语言文本到语音软件组件
3. Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora [J] . Reber Ueli Communication Methods and Measures . 2019,第2期

机译：克服语言障碍：评估机器翻译和主题建模的潜力，以了解多语言文本语料库的比较分析
4. A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts [C] . Wafia Adouane, Simon Dobnik, Jean-Philippe Bernardy, Second workshop on subword and character level models in NLP 2018 . 2018

机译：字符神经语言模型和自举技术在多语言嘈杂文本识别中的比较
5. Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling [D] . Winata, Genta Indra. 2021

机译：代码交换语言和语音神经建模的多语言转移学习
6. De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models [O] . Buzhou Tang, Dehuan Jiang, Qingcai Chen, 2019

机译：通过带有神经语言模型的Bi-LSTM-CRF取消对临床文本的识别
7. Language Set Identification in Noisy Synthetic Multilingual Documents [O] . Jauhiainen, Tommi Sakari, Linden, Krister, Jauhiainen, Heidi Annika 2015

机译：嘈杂的多语言文档中的语言集识别
8. Application of Convolutional Neural Networks to Language Identification in Noisy Conditions [R] . Lei, Y, Ferrer, L, Lawson, A, 2014

机译：卷积神经网络在噪声条件下语言识别中的应用

A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts

摘要

著录项

相似文献

相关主题

期刊订阅