Efficient Discrimination Between Closely Related Languages

机译：密切相关语言之间的有效区分

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we revisit the problem of language identification with the focus on proper discrimination between closely related languages. Strong similarities between certain languages make it very hard to classify them correctly using standard methods that have been proposed in the literature. Dedicated models that focus on specific discrimination tasks help to improve the accuracy of general-purpose language identification tools. We propose and compare methods based on simple document classification techniques trained on parallel corpora of closely related languages and methods that emphasize discriminating features in terms of blacklisted words. Our experiments demonstrate that these techniques are highly accurate for the difficult task of discriminating between Bosnian, Croatian and Serbian. The best setup yields an absolute improvement of over 9% in accuracy over the best performing baseline using a state-of-the-art language identification tool.

机译：在本文中，我们将重点关注密切相关的语言之间的适当区别，从而重新审视语言识别问题。某些语言之间的强相似性使得很难使用文献中提出的标准方法对它们进行正确分类。专注于特定歧视任务的专用模型有助于提高通用语言识别工具的准确性。我们提出并比较基于在密切相关的语言的平行语料库上训练的简单文档分类技术的方法，以及强调根据黑名单单词区分特征的方法。我们的实验表明，这些技术对于区分波斯尼亚语，克罗地亚语和塞尔维亚语的艰巨任务非常准确。使用最先进的语言识别工具，与最佳性能基准相比，最佳设置绝对可以使准确性绝对提高9％以上。

著录项

来源
《International conference on computational linguistics》|2012年|2619-2633|共15页
会议地点
作者
Joerg TIEDEMANN; Nikola LJUBESIC;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
language identification; language discrimination; closely related languages;

机译：语言识别;语言歧视;密切相关的语言;

相似文献

外文文献
中文文献
专利

1. Computing Efficiently the Closeness of Word Sets in Natural Language Texts [J] . DOMENICO CANTONE, SALVATORE CRISTOFARO, GIUSEPPE PAPPALARDO International journal of computational linguistics and applications . 2015,第1期

机译：有效计算自然语言文本中单词集的接近度
2. Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages [J] . Adeeba Farah, Hussain Sarmad Circuits, systems, and signal processing . 2018,第8期

机译：声学特征分析和判别建模用于密切相关的南亚语言的语言识别
3. Observation of unaveraged giant MEG activity from language areas during speech tasks in patients harboring brain lesions very close to essential language areas: expression of brain plasticity in language processing networks? [J] . Grummich P, Nimsky C, Fahlbusch R, Neuroscience Letters: An International Multidisciplinary Journal Devoted to the Rapid Publication of Basic Research in the Brain Sciences . 2005,第1a2期

机译：在非常接近基本语言区域的大脑病变患者的语音任务期间，在语言任务中来自语言区域的平均巨型MEG活动的观察结果：语言处理网络中大脑可塑性的表达？
4. Efficient Discrimination Between Closely Related Languages [C] . Joerg TIEDEMANN, Nikola LJUBESIC International conference on computational linguistics . 2012

机译：与密切相关的语言之间的高效歧视
5. mu-Conotoxins as modulators of electrical signaling in nerve and muscle: Molecular basis of sodium channel block and discrimination among closely related channels. [D] . McArthur, Jeffrey Robert. 2011

机译：mu-Conotoxins作为神经和肌肉中电信号的调节剂：钠通道阻滞的分子基础和密切相关的通道之间的区别。
6. Afrikaans and Dutch as closely-related languages: A comparison to West Germanic languages and Dutch dialects [O] . Wilbert Heeringa, Febe de Wet, Gerhard B. van Huyssteen 2015

机译：南非荷兰语和荷兰语是密切相关的语言：与西日耳曼语言和荷兰方言的比较

Efficient Discrimination Between Closely Related Languages

摘要

著录项

相似文献

相关主题

期刊订阅