PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

机译：PodCastle：基于人群智慧的语言模型协作培训

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in preparing such task-specific language models in advance, we propose collaborative training of language models on the basis of wisdom of crowds. On our public web service for LVCSR-based spoken document retrieval PodCastle, over half a million recognition errors were corrected by anonymous users. By leveraging such corrected transcriptions, component language models for various topics can be built and dynamically mixed to generate an appropriate language model for each podcast episode in an unsupervised manner. Experimental results with Japanese podcasts showed that the mixed languages models significantly reduced the word error rate.

机译：本文提出了一种用于改进在线口语内容自动转录的语言模型训练方法。与以前研究的LVCSR任务（例如广播新闻和演讲）不同，用于培训语言模型的大型任务专用语料库由于主题，词汇和说话风格的多样性而无法准备并用于识别。为了克服事先准备此类任务特定语言模型的困难，我们建议在人群智慧的基础上进行协作训练语言模型。在基于LVCSR的语音文档检索PodCastle的公共网络服务上，匿名用户已纠正了超过一百万的识别错误。通过利用这种经过纠正的转录，可以构建各种主题的组件语言模型并将其动态混合，从而以无监督的方式为每个播客情节生成合适的语言模型。日语播客的实验结果表明，混合语言模型显着降低了单词错误率。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|2367-2370|共4页
会议地点
作者
Jun Ogata; Masataka Goto;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
web service; LVCSR; language modeling; wis-dom of crowds; error correction;

机译：网络服务; LVCSR;语言建模;人群的智慧;错误更正;
入库时间 2022-08-26 15:11:03

相似文献

外文文献
中文文献
专利

1. The role of sense of community in harnessing the wisdom of crowds and creating collaborative knowledge during the COVID-19 pandemic [J] . Al-Omoush Khaled Saleh, Orero-Blat Maria, Ribeiro-Soriano Domingo Journal of Business Research . 2021,第Auga期

机译：社区意识在Covid-19大流行期间利用人群智慧和创造协作知识的作用
2. Quality improvement collaboratives and the wisdom of crowds: spread explained by perceived success at group level [J] . Michel L A Dückers, Peter P Groenewegen, Cordula Wagner Implementation Science . 2014,第1期

机译：质量改进协作和人群智慧：通过在团队一级的成功感知来解释传播
3. Collaborative Tagging: Traditional Cataloging Meets the 'Wisdom of Crowds' [J] . SCOTT McFADDEN, JENNA VENKER WEIDENBENNER Serials librarian . 2010,第1a4期

机译：协作标记：传统编目遇到了“人群的智慧”
4. PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds [C] . Jun Ogata, Masataka Goto INTERSPEECH 2012 . 2012

机译：Podcastle：基于人群智慧的语言模型的协作培训
5. Using Bayesian Cognitive Models in Wisdom of the Crowd Applications [D] . Danileiko, Irina. 2018

机译：在人群应用中使用贝叶斯认知模型
6. Quality improvement collaboratives and the wisdom of crowds: spread explained by perceived success at group level [O] . Michel L A Dückers, Peter P Groenewegen, Cordula Wagner 1985

机译：质量改进协作和人群智慧：通过在团队一级获得成功来解释传播
7. Quality improvement collaboratives and the wisdom of crowds: spread explained by perceived success at group level [O] . Michel L A Dückers, Peter P Groenewegen, Cordula Wagner 2014

机译：质量改进协作和人群智慧：通过在团队一级获得成功来解释传播

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

摘要

著录项

相似文献

相关主题

期刊订阅