首页> 外文会议>Conference on Cross-Language Search and Summarization of Text and Speech >Corpora for Cross-Language Information Retrieval in Six Less-Resourced Languages
【24h】

Corpora for Cross-Language Information Retrieval in Six Less-Resourced Languages

机译:六种资源较少的语言的跨语言信息检索语料库

获取原文

摘要

The Machine Translation for English Retrieval of Information in Any Language (MATERIAL) research program, sponsored by the Intelligence Advanced Research Projects Activity (IARPA), focuses on rapid development of end-to-end systems capable of retrieving foreign language speech and text documents relevant to different types of English queries that may be further restricted by domain. Those systems also provide evidence of relevance of the retrieved content in the form of English summaries The program focuses on Less-Resourced Languages and provides its performer teams very limited amounts of annotated training data This paper describes the corpora that were created for system development and evaluation for the six languages released by the program to date: Tagalog, Swahili, Somali, Lithuanian, Bulgarian and Pashto. The corpora include build packs to train Machine Translation and Automatic Speech Recognition systems; document sets in three text and three speech genres annotated for domain and partitioned for analysis, development and evaluation; and queries of several types together with corresponding binary relevance judgments against the entire set of documents. The paper also describes a detection metric called Actual Query Weighted Value developed by the program to evaluate end-to-end system performance.
机译:由情报高级研究项目活动(IARPA)赞助的“以任何语言进行英语英语检索信息的机器翻译”(MATERIAL)研究计划,着重于能够检索与外语相关的语音和文本文档的端到端系统的快速开发可能会受到域进一步限制的不同类型的英语查询。这些系统还以英语摘要的形式提供了与检索到的内容相关的证据。该程序侧重于资源较少的语言,并为其表演者团队提供了数量非常有限的带注释的培训数据。本文介绍了为系统开发和评估而创建的语料库。该程序迄今已发布的六种语言:塔加洛语,斯瓦希里语,索马里语,立陶宛语,保加利亚语和普什图语。语料库包括用于训练机器翻译和自动语音识别系统的构建包;具有三种文本和三种语音类型的文档集,分别标注了域和分区以进行分析,开发和评估;以及针对整个文档集的几种类型的查询以及相应的二进制相关性判断。本文还介绍了一种由程序开发的称为“实际查询加权值”的检测指标,用于评估端到端系统性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号