首页> 外文会议> >Speaker identification based text to audio alignment for an audio retrieval system
【24h】

Speaker identification based text to audio alignment for an audio retrieval system

机译:基于说话人识别的文本到音频检索系统的音频对齐

获取原文

摘要

We report on an audio retrieval system which lets Internet users efficiently access a large audio database containing recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to text transcripts of the proceedings (which are manually generated by the US Government) using a novel method based on speaker identification. Speaker sequence and approximate timing information is extracted from the text transcript and used to constrain a Viterbi alignment of speaker models to the observed audio. Speakers are modeled by computing Gaussian statistics of cepstral coefficients extracted from samples of each person's speech. The speaker identification is used to locate speaker transition points in the audio which are then linked to corresponding speaker transitions in the text transcript. The alignment system has been successfully integrated into a World Wide Web based search and browse system as an experimental service on the Internet.
机译:我们报告了一个音频检索系统,该系统可使Internet用户有效地访问包含美国众议院议事记录的大型音频数据库。使用基于说话者识别的新颖方法,音频已在时间上与会议记录的文字记录(由美国政府手动生成)对齐。从文本记录中提取说话者序列和大概的时间信息,并将其用于约束说话者模型与观察到的音频的维特比对齐。通过计算从每个人的语音样本中提取的倒谱系数的高斯统计量来对说话者进行建模。说话人标识用于定位音频中的说话人过渡点,然后将其链接到文本抄本中的相应说话人过渡。对准系统已成功集成到基于Internet的搜索和浏览系统中,作为Internet上的一项实验性服务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号