【24h】

Topic indexing of spoken documents based on optimized N-best approach

机译:基于优化的N最佳方法的语音文档主题索引

获取原文

摘要

For topic indexing of spoken documents, the word error rate is hopefully decreased instead of the whole sentence error rate, so the center hypothesis among the N-best results is selected as the final output in speech recognition system. Then all spoken documents can be represented as vectors with high dimensions in vector space model, which can be combined with non-negative matrix factorization or singular value decomposition to map the vector space into semantic space. Experiment results show that optimized N-best approach is more suitable to the topic indexing system than one-best method. Combined with the non-negative matrix factorization, the correct topic indexing can achieve 98.1% in optimized N-best approach, which is 0.9% higher than the onebest approach under the same condition. When the semantic space is decreased to 10, there is about 11.1% difference between these two approaches. Furthermore, compared with singular value decomposition method, non-negative matrix factorization has the advantages of better performance, faster computation speed and less storage space.
机译:对于语音文档的主题索引,希望降低单词错误率而不是整个句子错误率,因此,将N个最佳结果中的中心假设作为语音识别系统的最终输出。这样,在矢量空间模型中,所有语音文档都可以表示为高维矢量,可以与非负矩阵分解或奇异值分解相结合,将矢量空间映射到语义空间中。实验结果表明,与一种最佳方法相比,优化的N最佳方法更适合主题索引系统。结合非负矩阵分解,在优化的N-最佳方法中正确的主题索引可以达到98.1%,比相同条件下的最佳方法高0.9%。当语义空间减小到10时,这两种方法之间大约有11.1%的差异。此外,与奇异值分解方法相比,非负矩阵分解具有更好的性能,更快的计算速度和更少的存储空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号