首页> 外文会议>ACM workshop on searching spontaneous conversational speech >Towards Methods for Efficient Access to Spoken Content in the AMI Corpus
【24h】

Towards Methods for Efficient Access to Spoken Content in the AMI Corpus

机译:迈向有效访问AMI语料库中的有效访问内容的方法

获取原文

摘要

Increasing amounts of informal spoken content are being collected. This material does not have clearly defined document forms either in terms of structure or topical content, e.g. recordings of meetings, lectures and personal data sources. Automated search of this content poses challenges beyond retrieval of defined documents, including definition of search items and location of relevant content within them. While most existing work on speech search focused on clearly defined document units, in this paper we describe our initial investigation into search of meeting content using the AMI meeting collection. Manual and automated transcripts of meetings are first automatically segmented into topical units. A known-item search task is then performed using presentation slides from the meetings as search queries to locate relevant sections of the meetings. Query slides were selected corresponding to well recognised and poorly recognised spoken content, and randomly selected slides. Experimental results show that relevant items can be located with reasonable accuracy using a standard information retrieval approach, and that there is a clear relationship between automatic transcription accuracy and retrieval effectiveness.
机译:正在收集增加非正式口语内容的数量。这种材料在结构或局部内容方面没有明确定义的文献形式,例如,会议,讲座和个人数据来源的录音。自动搜索此内容的挑战超出了定义文档的检索,包括搜索项目的定义和它们内部相关内容的位置。虽然大多数现有的语音搜索工作侧重于明确定义的文件单元,但在本文中,我们描述了使用AMI会议集合搜索会议内容的初步调查。首次将会议的手动和自动化成绩单自动分段为局部单元。然后使用从会议的演示幻灯片作为搜索查询来执行已知的项目搜索任务,以找到会议的相关部分。查询幻灯片是对应于良好识别的且识别不良的口头内容和随机选择的幻灯片。实验结果表明,相关物品可以使用标准信息检索方法具有合理的准确性,并且自动转录精度与检索效率之间存在明显的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号