Using Place Name Data to Train Language Identification Models

机译：使用地名数据训练语言识别模型

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The language of origin of a name affects its pronunciation, so language identification is an important technology for speech synthesis and recognition. Previous work on this task has typically used training sets that are proprietary or limited in coverage. In this work, we investigate the use of a publically-available geographic database for training language ID models. We automatically cluster place names by language, and show that models trained from place name data are effective for language ID on person names. In addition, we compare several source-channel and direct models for language ID, and achieve a 24% reduction in error rate over a source-channel letter tri-gram model on a 26-way language ID task.

机译：名称的起源语言会影响其发音，因此语言识别是语音合成和识别的重要技术。以前有关此任务的工作通常使用专有或覆盖范围有限的培训集。在这项工作中，我们调查了使用公开可用的地理数据库来训练语言ID模型的情况。我们会自动按语言对地名进行聚类，并显示根据地名数据训练的模型对于人名上的语言ID有效。此外，我们对语言ID的几种源通道和直接模型进行了比较，与在26向语言ID任务上的源通道字母三元模型相比，错误率降低了24％。

著录项

来源
《European Conference on Speech Communication and Technology - EUROSPEECH 2003(INTERSPEECH 2003) vol.2; 20030901-04; Geneva(CH)》|2003年|P.1349-1352|共4页
会议地点 Geneva(CH)
作者
Stanley F. Chen; Benoit Maison;
展开▼
作者单位

IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动信息理论;
关键词

相似文献

外文文献
中文文献
专利

1. Teach Your Robot Your Language! Trainable Neural Parser for Modeling Human Sentence Processing: Examples for 15 Languages [J] . Hinaut Xavier, Twiefel Johannes IEEE Transactions on Cognitive and Developmental Systems . 2020,第2期

机译：教你的机器人你的语言！用于建模人类句子处理的可训练神经解析器：15种语言的例子
2. A General Technique to Train Language Models on Language Models [J] . Mark-Jan Nederhof Computational linguistics . 2005,第2期

机译：在语言模型上训练语言模型的通用技术
3. A General Technique to Train Language Models on Language Models [J] . Mark-Jan Nederhof Computational linguistics . 2005,第2期

机译：在语言模型上训练语言模型的通用技术
4. Linguist Geeks on WNUT-2020 Task 2: COVID-19 Informative Tweet Identification using Progressive Trained Language Models and Data Augmentation [C] . Vasudev Awatramani, Anupam Kumar Workshop on noisy user-generated text . 2020

机译：Wnut-2020任务2：使用渐进式培训的语言模型和数据增强的Covid-19信息推文识别
5. Logic, formal languages, and formal language identification. Some logical properties of the languages in the Chomsky hierarchy, and an interrogative model of formal language identification. [D] . Pylkko, Pauli Olavi. 1988

机译：逻辑，形式语言和形式语言标识。乔姆斯基层次结构中语言的某些逻辑属性，以及形式语言标识的疑问模型。
6. Enhancing African low-resource languages: Swahili data for language modelling [O] . Casper S. Shikali, Refuoe Mokhosi 2020

机译：增强非洲低资源语言：语言建模的斯瓦希里语数据
7. An Experimental Comparison of the Geometry of Models Trained on Natural Language and Synthetic Data [O] . Vincent Sippola, Robert E. Mercer 2021

机译：自然语言和合成数据训练模型几何学的实验比较

Using Place Name Data to Train Language Identification Models

摘要

著录项

相似文献

相关主题

期刊订阅