首页> 外文会议> >Taiwanese TV news-to-document index system
【24h】

Taiwanese TV news-to-document index system

机译:台湾电视新闻到文献索引系统

获取原文

摘要

This paper describes an index system from Taiwanese TV speech news to World Wide Web Chinese text documents. This system is based on two main techniques: automatic speech recognition (ASR) and bi-lingual text alignment. For the former, we utilized the speech-to-text approach to recognize the utterance of anchors in the TV news as Taiwanese tonal syllable sequences. Then we translated the Chinese text documents which obtained from the corresponding news website to the Taiwanese tonal syllables by a bi-lingual pronunciation lexicon. Afterward, a dynamic programming algorithm is used in the syllable-level alignment for linking the TV news and the documents. A corpus of speech data about 100 speakers and the text data with 840k Chinese characters were used to train the acoustic and language models in ASR. A bi-lingual lexicon contains 70k vocabularies is used as the resource of the pronunciation model for ASR and the statistical translation model for bi-lingual text alignment. Finally, the experiment of the TV news with 40 stories was evaluated for the document index system, and the accuracy rate of index is over 82% on average.
机译:本文介绍了从台湾电视台语音新闻到万维网中文文本文档的索引系统。该系统基于两种主要技术:自动语音识别(ASR)和双语文本对齐。对于前者,我们利用语音到文本的方法将电视新闻中锚点的发音识别为台湾音调音节序列。然后,我们从相应的新闻网站获得的中文文本文件通过双语发音词典翻译成台湾的音节。之后,在音节级对齐中使用动态编程算法来链接电视新闻和文档。 ASR中使用了大约100个说话者的语音数据和带有840k汉字的文本数据来训练声学和语言模型。包含70k词汇的双语词典被用作ASR的发音模型和双语文本对齐的统计翻译模型的资源。最后,通过文献索引系统对40个新闻报道的电视新闻实验进行了评估,索引准确率平均在82%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号