【24h】

Spoken document retrieval using topic models

机译:使用主题模型的语音文档检索

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a document topic model (DTM) based on the non-negative matrix factorization (NMF) approach to explore spontaneous spoken document retrieval. The model uses latent semantic indexing to detect underlying semantic relationships within documents. Each document is interpreted as a generative topic model belonging to many topics. The relevance of a document to a query is expressed by the probability of a query being generated by the model. The term-document matrix used for NMF is built stochastically from the speech recognition N-best results, so that multiple recognition hypotheses can be utilized to compensate for the word recognition errors. Using this approach, experiments are conducted on a test collection from the Corpus of Spontaneous Japanese (CSJ), with 39 queries for over 600 hours of spontaneous Japanese speech. The retrieval performance of this model is proved to be superior to the conventional vector space model (VSM) when the dimension or topic number exceeds acertain threshold. Moreover, whether from the viewpoint of retrieval performance or the ability of topic expression, the NMF-based topic model is verified to surpass another latent indexing method that is based on the singular value decomposition (SVD). The extent to which this topic model can resist speech recognition error, which is a special problem of spoken document retrieval, is also investigated.
机译:在本文中,我们提出了一种基于非负矩阵分解(NMF)方法的文档主题模型(DTM),以探索自发的语音文档检索。该模型使用潜在语义索引来检测文档中的基础语义关系。每个文档都被解释为属于许多主题的生成主题模型。文档与查询的相关性由模型生成查询的概率表示。用于NMF的术语文档矩阵是根据语音识别N个最佳结果随机建立的,因此可以利用多个识别假设来补偿单词识别错误。使用这种方法,对来自自发日语语料库(CSJ)的测试集进行了实验,其中有39个查询询问了超过600小时的自发日语语音。事实证明,当维度或主题数超过一定阈值时,该模型的检索性能优于常规向量空间模型(VSM)。此外,无论从检索性能还是主题表达能力的角度来看,基于NMF的主题模型都被证明可以超越另一种基于奇异值分解(SVD)的潜在索引方法。还研究了该主题模型在多大程度上可以抵抗语音识别错误,这是语音文档检索的一个特殊问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号