首页> 外文会议>European conference on speech communication and technology >Multi-Scale Retrieval in MEI: An English-Chinese Translingual Speech Retrieval System
【24h】

Multi-Scale Retrieval in MEI: An English-Chinese Translingual Speech Retrieval System

机译:MEI中的多尺度检索:英汉转换检索系统

获取原文

摘要

This paper presents a multi-scale retrieval approach in MEI (Mandarin-English Information), an English-Chinese cross-lingual spoken document retrieval (CL-SDR) system. It accepts an entire English news story (from newspaper text) as the input query, and automatically retrieves "relevant" Mandarin news stories (from broadcast audio). This allows the user to search for personally relevant content across the language and media barriers - a cross-lingual and cross-media retrieval task. MEI advocates a multi-scale paradigm for the retrieval task. Multi-scale refers to the use of both words and subwords (Chinese characters and syllables) for retrieval. Words offer lexical knowledge to enhance precision, and subwords can potentially alleviate some prevailing problems in CL-SDR, e.g. open vocabularies in translation and recognition, out-of-vocabulary words in audio indexing, and ambiguities in Chinese homophones and word tokenizaiton. We present techniques for word-subword fusion, which improved retrieval performance in our experiments with the Topic Detection and Tracking collection.
机译:本文介绍了MEI(普通话信息)的多尺度检索方法,是英汉交叉语言文献检索(CL-SDR)系统。它接受整个英语新闻故事(从报纸文本)作为输入查询,并自动检索“相关”的普通话新闻故事(从广播音频)。这允许用户在语言和媒体障碍跨越语言和媒体障碍的个人相关内容 - 一种交叉语言和跨媒的检索任务。 Mei倡导一个多尺度范例来检索任务。多尺度是指使用单词和子字(汉字和音节)来检索。单词提供了引入精度的词法知识,并且次字可能会缓解CL-SDR中的一些普遍存在的问题,例如,在翻译和识别中开放词汇,音频索引中的词汇单词,以及中文同音词汇和单词tokenizaiton的含糊不清。我们为单词次字融合提供了技术,在我们的实验中提高了检测和跟踪集合的实验中的检索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号