首页> 外文期刊>IEICE Transactions on Information and Systems >Sounds of Speech Based Spoken Document Categorization: A Subword Representation Method
【24h】

Sounds of Speech Based Spoken Document Categorization: A Subword Representation Method

机译:基于语音的语音文档分类:子词表示方法

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we explore a method to the problem of spoken document categorization, which is the task of automatically assigning spoken documents into a set of predetermined categories. To categorize spoken documents, subword unit representations are used as an alternative to word units generated by either keyword spotting or large vocabulary continuous speech recognition (LVCSR). An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken documents and addresses the out of vocabulary (OOV) problem. Moreover, this method works in reliance on the sounds of speech rather than exact orthography. The use of subword units instead of words allows approximate matching on inaccurate transcriptions, makes "sounds-like" spoken document categorization possible. We also explore the performance of our method when the training set contains both perfect and errorful phonetic transcriptions, and hope the classifiers can learn from the confusion characteristics of recognizer and pronunciation variants of words to improve the robustness of whole system. Our experiments based on both artificial and real corrupted data sets show that the proposed method is more effective and robust than the word based method.
机译:在本文中,我们探索了一种解决语音文档分类问题的方法,该方法是自动将语音文档分配到一组预定类别中的任务。为了对口头文档进行分类,子词单位表示法可替代关键字搜索或大词汇量连续语音识别(LVCSR)生成的词单位。使用子词声学单位表示进行语音文档分类的一个优点是,它不需要有关语音文档内容的先验知识,并且解决了词汇量不足(OOV)问题。而且,该方法依赖于语音而不是精确的拼字法。使用子词单位而不是词可以对不准确的转录进行近似匹配,从而可以实现“似声音”的语音文档分类。当训练集包含完美和错误的语音转录时,我们还探讨了我们方法的性能,并希望分类器可以从识别器和单词发音变体的混淆特征中学习,以提高整个系统的鲁棒性。我们基于人工和真实损坏数据集的实验表明,该方法比基于单词的方法更加有效和健壮。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号