Taiwanese Corpus collection Via Continuous speech Recognition Tool

机译：通过连续语音识别工具收集台湾语料库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Corpora, in their different forms for different purposes, have been the bases for modern natural language procesing technology. Taiwanese (MinNan), as other language members in the Sino-Tibet family, has bee nmarginalized due to many reasons. One of the conseqeunces of this marginalization is that no standard written script exists, and thus collecting corpus for these languages has been extremely difficult. By (almost) arbitrarily selecting the hanlor written script (mixture of hanzi and roman characters), we are still facing the problem that only few people are capable of phonetically transcribing a given Taiwanese text. On the other hand, reading a Taiwanese text is easier due to the existence of many commonly used hanzi. By recording a person's reading of Taiwanese text, we use a continuous speech recognizer for Taiwanese to automatically transcribe the text, and end up with two kinds of corpora, one in text, one in speech. The accuracy of the automatic phonetic transcription is about 96.05

机译：语料库以其不同形式用于不同目的，已成为现代自然语言处理技术的基础。台湾人（MinNan）和汉藏语系中的其他语言成员一样，由于多种原因而被蜂拥而至。这种边缘化的后果之一是不存在标准的书面文字，因此为这些语言收集语料库非常困难。通过（几乎）任意选择汉罗书面文字（汉子和罗马字符的混合），我们仍然面临着这样的问题：只有很少的人能够用语音来抄录给定的台湾文字。另一方面，由于存在许多常用的汉字，因此阅读台湾文字更加容易。通过记录一个人对台湾文字的阅读情况，我们使用针对台湾人的连续语音识别器自动转录文字，最后得到两种语料，一种是文字，一种是语音。自动语音转录的准确性约为96.05

著录项

来源
《6th International Conference on Spoken Language Processing ICSLP 2000 Oct.16-Oct.20 2000 Beijing International Convention Center, Beijing, China》|2000年|p.1031-1034|共4页
会议地点
作者
Yuan-chin Chiang; Zhi-siang Yang; Ren-yuan Lyu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类世界各国文化与文化事业;
关键词

相似文献

外文文献
中文文献
专利

1. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus [J] . Mohammad Abushariah, Raja Ainon, Roziati Zainuddin, The international arab journal of information technology . 2012,第1期

机译：基于语音丰富均衡的语料库的阿拉伯语独立于说话人的连续自动语音识别
2. Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems [J] . Mohammad Abd-Alrahman Mahmoud Abushariah, Raja Noor Ainon, Roziati Zainuddin, Journal of the Franklin Institute . 2012,第7期

机译：用于实现和评估自动连续语音识别系统的现代标准阿拉伯语语音语料库
3. JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research [J] . Katunobu Itou, Mikio Yamamoto, Kazuya Takeda, The Journal of the Acoustical Society of Japan . 1999,第3期

机译：JNAS：日语语音语料库，用于大词汇量连续语音识别研究
4. Taiwanese Corpus collection Via Continuous speech Recognition Tool [C] . Yuan-chin Chiang, Zhi-siang Yang, Ren-yuan Lyu International conference on spoken language processing . 2000

机译：台湾语料库集合通过连续语音识别工具
5. A speech recognition system for data collection in precision agriculture. [D] . Dux, David Lee. 2001

机译：用于精确农业中数据收集的语音识别系统。
6. Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition [O] . Minji Seo, Myungho Kim 2020

机译：融合视觉关注CNN和跨语料语音情感识别的视觉词语
7. JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. [O] . Katunobu Itou, Mikio Yamamoto, Kazuya Takeda, 1999

机译：JNAS：日语语音语料库，用于大词汇连续语音识别研究。
8. Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004 [R] . Martin, A., Miller, D., Przybocki, M., 2004

机译：2004年NIsT演讲者认可评估的会话电话语音语料库集

Taiwanese Corpus collection Via Continuous speech Recognition Tool

摘要

著录项

相似文献

相关主题

期刊订阅