首页> 外国专利> CONCEPT BASED CROSS MEDIA INDEXING AND RETRIEVAL OF SPEECH DOCUMENTS

CONCEPT BASED CROSS MEDIA INDEXING AND RETRIEVAL OF SPEECH DOCUMENTS

机译:基于概念的跨媒体索引和语音文档检索

摘要

Indexing, searching, and retrieving the content of speech documents (including but not limited to recorded books, audio broadcasts, recorded conversations) is accomplished by finding and retrieving speech documents that are related to a query term at a conceptual level, even if the speech documents does not contain the spoken (or textual) query terms. Concept-based cross-media information retrieval is used. A term-phoneme/document matrix is constructed from a training set of documents. Documents are then added to the matrix constructed from the training data. Singular Value Decomposition is used to compute a vector space from the term-phoneme/document matrix. The result is a lower-dimensional numerical space where term-phoneme and document vectors are related conceptually as nearest neighbors. A query engine computes a cosine value between the query vector and all other vectors in the space and returns a list of those term-phonemes and/or documents with the highest cosine value.
机译:索引,搜索和检索语音文档的内容(包括但不限于录制的书籍,音频广播,记录的对话)是通过在概念级别上查找和检索与查询词相关的语音文档来完成的,即使语音文档不包含口头(或文字)查询字词。使用基于概念的跨媒体信息检索。音素/文档矩阵是根据一组训练的文档构建的。然后将文档添加到由训练数据构成的矩阵中。奇异值分解用于从术语音素/文档矩阵计算向量空间。结果是一个较低维的数字空间,其中术语音素和文档向量在概念上与最近的邻居相关。查询引擎计算查询向量与空间中所有其他向量之间的余弦值,并返回那些具有最高余弦值的术语音素和/或文档的列表。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号