【24h】

VIDEO INDEXING BASED ON IMAGE AND SOUND

机译:基于图像和声音的视频索引

获取原文

摘要

Video indexing is a major challenge for both scientific and economic reasons. Information extraction can sometimes be easier from sound channel than from image channel, We first present a multi-channel and multi-modal query interface, to query sound, image and script through "pull" and "push" queries. We then summarize the segmentation phase, which needs information from the image channel. Detection of critical segments is proposed. It should speed-up both automatic and manual indexing. We then present an overview of the information extraction phase. Information can be extracted from the sound channel, through speaker recognition, vocal dictation with unconstrained vocabularies, and script alignment with speech (or "script warping"). We present experiment results for these various techniques. Speaker recognition methods were tested on the TIMIT and NTIMIT database. Vocal dictation was experimented on newspaper sentences spoken by several speakers. Script alignment was tested on part of a cartoon movie, "Ivanhoe". For good quality sound segments, error rates are low enough for use in indexing applications. Major issues are the processing of sound segments with noise or music, and performance improvement through the use of appropriate, low-cost parallel architectures or networks of workstations.
机译:视频索引对科学和经济原因的主要挑战。信息提取有时可以从声道通道更容易地从图像通道更容易,我们首先呈现一个多通道和多模态查询接口,通过“拉”和“推送”查询来查询声音,图像和脚本。然后,我们总结了从图像通道需要信息的分割阶段。提出了临界段的检测。它应该加速自动和手动索引。然后,我们概述了信息提取阶段。信息可以从声道中提取,通过扬声器识别,声音听取与无约束词汇表,以及语音(或“脚本翘曲”的脚本对齐。我们为这些各种技术提出了实验结果。在Timit和NTimit数据库上测试了扬声器识别方法。在几位发言者中讲的报纸句中试验声音。脚本对齐在卡通电影“Ivanhoe”的一部分上进行了测试。对于良好的质量声音段,错误率足够低,以便在索引应用中使用。主要问题是通过使用适当,低成本并行架构或工作站网络来处理具有噪声或音乐的声音段,以及性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号