首页> 外文会议> >Fast latent semantic indexing of spoken documents by using self-organizing maps

【24h】

Fast latent semantic indexing of spoken documents by using self-organizing maps

机译：通过自组织映射快速对语音文档进行潜在语义索引

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is indexing broadcast news from radio and TV as a combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) and information retrieval (IR). For indexing, the documents are presented as vectors of word counts, whose dimensionality is rapidly reduced by random mapping (RM). The obtained vectors are projected into the latent semantic subspace determined by SVD, where the vectors are then smoothed by a self-organizing map (SOM). The smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER). As the clusters in the semantic subspace reflect the news topics, the SOMs provide an easy way to visualize the index and query results and to explore the database. Test results are reported for TREC's spoken document retrieval databases (www.idiap.ch/kurimo/thisl.html).

机译：本文介绍了一种新的语音语音文档潜在语义索引（LSI）方法。该框架将广播和广播中的广播新闻编入索引，这是大词汇量连续语音识别（LVCSR），自然语言处理（NLP）和信息检索（IR）的组合。为了建立索引，文档以单词计数的向量表示，其维数通过随机映射（RM）迅速降低。将获得的向量投影到由SVD确定的潜在语义子空间中，然后通过自组织映射（SOM）对向量进行平滑处理。在这里，最接近的文档簇进行平滑处理很重要，因为文档通常很短并且具有较高的误码率（WER）。由于语义子空间中的聚类反映了新闻主题，因此SOM提供了一种简便的方法来可视化索引和查询结果以及浏览数据库。测试结果报告给TREC的口头文档检索数据库（www.idiap.ch/kurimo/thisl.html）。

著录项

来源
《》|2000年|P.2425-2428|共4页
会议地点
作者
Kurimo; M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. LSISOM - A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections [J] . NIKOLAOS AMPAZIS, STAVROS J. PERANTONIS Neural processing letters . 2004,第2期

机译：LSISOM-一种潜在的语义索引方法，用于自组织文档集合图
2. Thematic indexing of spoken documents by using self-organizing maps [J] . Mikko Kurimo Speech Communication . 2002,第1a2期

机译：使用自组织映射对语音文档进行主题索引
3. Semantic Analysis and Organization of Spoken Documents Based on Parameters Derived From Latent Topics [J] . Kong S.-Y., Lee L.-S. Audio, Speech, and Language Processing, IEEE Transactions on . 2011,第7期

机译：基于潜在主题的参数对口语文档的语义分析和组织
4. Fast latent semantic indexing of spoken documents by using self-organizing maps [C] . Kurimo M., Institute of Electric and Electronic Engineer IEEE International Conference on Acoustics, Speech, and Signal Processing . 2000

机译：通过使用自组织地图对口语文档的快速潜在语义索引
5. Study of document retrieval using Latent Semantic Indexing (LSI) on a very large data set. [D] . Zaman, A. N. K. 2010

机译：使用潜在语义索引（LSI）对非常大的数据集进行文档检索的研究。
6. Evaluation of Co-occurring Terms in Clinical Documents Using Latent Semantic Indexing [O] . Choonghyun Han, Sooyoung Yoo, Jinwook Choi 2011

机译：使用潜在语义索引评估临床文献中同时出现的术语
7. Fast Latent Semantic Indexing of Spoken Documents by Using Self-Organizing Maps [O] . Mikko Kurimo 1999

机译：利用自组织映射快速潜在的语音文本索引

Fast latent semantic indexing of spoken documents by using self-organizing maps

摘要

著录项

相似文献

相关主题

期刊订阅