A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts

机译：字符神经语言模型和自举技术在多语言嘈杂文本识别中的比较

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper seeks to examine the effect of including background knowledge in the form of character pre-trained neural language model (LM), and data bootstrapping to overcome the problem of unbalanced limited resources. As a test, we explore the task of language identification in mixed-language short non-edited texts with an under-resourced language, namely the case of Algerian Arabic for which both labelled and unlabelled data are limited. We compare the performance of two traditional machine learning methods and a deep neural networks (DNNs) model. The results show that overall DNNs perform better on labelled data for the majority categories and struggle with the minority ones. While the effect of the untokenised and unlabelled data encoded as LM differs for each category, bootstrapping, however, improves the performance of all systems and all categories. These methods are language independent and could be generalised to other under-resourced languages for which a small labelled data and a larger unlabelled data are available.

机译：本文旨在研究以字符预训练的神经语言模型（LM）形式包含背景知识以及数据引导的效果，以克服资源不平衡的问题。作为测试，我们探讨了在资源匮乏的混合语言简短非编辑文本中进行语言识别的任务，即标记和未标记数据都受到限制的阿尔及利亚阿拉伯语的情况。我们比较了两种传统机器学习方法和深度神经网络（DNN）模型的性能。结果表明，总体DNN在多数类别的标记数据上表现更好，并且与少数类别抗争。尽管对于每个类别，编码为LM的未标记和未标记的数据的效果会有所不同，但是自举可以提高所有系统和所有类别的性能。这些方法与语言无关，可以推广到资源不足的其他语言，这些语言可以使用较小的标记数据和较大的未标记数据。

著录项

来源
《Second workshop on subword and character level models in NLP 2018》|2018年|22-31|共10页
会议地点 New Orleans(US)
作者
Wafia Adouane; Simon Dobnik; Jean-Philippe Bernardy; Nasredine Semmar;
展开▼
作者单位

Department of Philosophy, Linguistics and Theory of Science (FLoV), Centre for Linguistic Theory and Studies in Probability (CLASP), University of Gothenburg;

Department of Philosophy, Linguistics and Theory of Science (FLoV), Centre for Linguistic Theory and Studies in Probability (CLASP), University of Gothenburg;

Department of Philosophy, Linguistics and Theory of Science (FLoV), Centre for Linguistic Theory and Studies in Probability (CLASP), University of Gothenburg;

CEA, LIST, Vision and Content Engineering Laboratory Gif-sur-Yvette, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Experimental Comparison of Modeling Techniques and Combination of Speaker - Specific Information from Different Languages for Multilingual Speaker Identification [J] . H. S. Jayanna, B. G. Nagaraja Journal of Intelligent Systems . 2016,第4期

机译：多种语言的说话人识别的建模技术和来自不同语言的说话人特定信息组合的实验比较
2. Multilingual Text-to-Speech Software Component for Dynamic Language Identification and Voice Switching [J] . Fogarassy-Neszly Paul, Pribeanu Costin Studies in Informatics and Control . 2016,第3期

机译：用于动态语言识别和语音切换的多语言文本到语音软件组件
3. Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora [J] . Reber Ueli Communication Methods and Measures . 2019,第2期

机译：克服语言障碍：评估机器翻译和主题建模的潜力，以了解多语言文本语料库的比较分析
4. A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts [C] . Wafia Adouane, Simon Dobnik, Jean-Philippe Bernardy, Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies . 2018

机译：多语言噪声文本中语言义语言模型的比较和盗版语言识别
5. Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling [D] . Winata, Genta Indra. 2021

机译：代码交换语言和语音神经建模的多语言转移学习
6. De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models [O] . Buzhou Tang, Dehuan Jiang, Qingcai Chen, 2019

机译：通过带有神经语言模型的Bi-LSTM-CRF取消对临床文本的识别
7. Language Set Identification in Noisy Synthetic Multilingual Documents [O] . Jauhiainen, Tommi Sakari, Linden, Krister, Jauhiainen, Heidi Annika 2015

机译：嘈杂的多语言文档中的语言集识别
8. Application of Convolutional Neural Networks to Language Identification in Noisy Conditions [R] . Lei, Y, Ferrer, L, Lawson, A, 2014

机译：卷积神经网络在噪声条件下语言识别中的应用

A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅