首页> 外文OA文献 >Utilisation of metadata fields and query expansion in cross-lingual search of user-generated Internet video
【2h】

Utilisation of metadata fields and query expansion in cross-lingual search of user-generated Internet video

机译:在用户生成的互联网视频的跨语言搜索中利用元数据字段和查询扩展

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent years have seen signicant eorts in the area of Cross Language Information Retrieval (CLIR) for text retrieval. This work initially focused on formally published content, but more recently research has begun to concentrate on CLIR for informal social media content. However, despite the current expansion in online multimedia archives, there has been little work on CLIR for this content. While there has been some limited work on Cross-Language Video Retrieval (CLVR) for professional videos, such as documentaries or TV news broadcasts, there has to date, been no signicant investigation of CLVR for the rapidly growing archives of informal user generated (UGC) content. Key differences between such UGC and professionally produced content are the nature and structure of the textual UGC metadata associated with it, as well as the form and quality of the content itself. In this setting, retrieval eectiveness may not only suer from translation errors common to all CLIR tasks, but also recognition errors associated with the automatic speech recognition (ASR) systems used to transcribe the spoken content of the video and with the informality and inconsistency of the associated user-created metadata for each video. This work proposes and evaluates techniques to improve CLIR effectiveness of such noisy UGC content. Our experimental investigation shows that dierent sources of evidence, e.g. the content from dierent elds of the structured metadata, significantly affect CLIR effectiveness. Results from our experiments also show that each metadata eldudhas a varying robustness to query expansion (QE) and hence can have a negative impact on the CLIR eectiveness. Our work proposes a novel adaptive QE technique that predicts the most reliable source for expansion and shows how this technique can be effective for improving CLIR effectiveness for UGC content.
机译:近年来,在跨语言信息检索(CLIR)领域中出现了引人注目的信息检索。这项工作最初集中在正式发布的内容上,但是最近的研究开始集中在CLIR上以获取非正式的社交媒体内容。但是,尽管当前在线多媒体档案的扩展,对于此内容在CLIR方面的工作很少。尽管针对专业视频(例如纪录片或电视新闻广播)的跨语言视频检索(CLVR)的工作有限,但迄今为止,对于迅速增长的非正式用户生成档案(UGC)的CLVR尚未进行任何有意义的调查)内容。此类UGC与专业制作的内容之间的主要区别是与之相关的文本UGC元数据的性质和结构,以及内容本身的形式和质量。在这种情况下,检索的有效性可能不仅会导致所有CLIR任务都存在翻译错误,而且还会导致与用于转录视频语音内容的自动语音识别(ASR)系统以及视频的非正式性和不一致性相关的识别错误。与每个视频相关联的用户创建的元数据。这项工作提出并评估了提高此类嘈杂的UGC内容的CLIR有效性的技术。我们的实验研究表明,不同的证据来源,例如结构化元数据的不同字段中的内容会严重影响CLIR的有效性。我们的实验结果还表明,每个元数据字段对查询扩展(QE)都有不同的鲁棒性,因此可能会对CLIR的有效性产生负面影响。我们的工作提出了一种新颖的自适应QE技术,该技术可预测最可靠的扩展来源,并显示该技术如何有效提高UGC内容的CLIR有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号