With the increasing democratization of electronic media, vast information resources are available in less-frequently-taught languages such as Swahili or Somali. That information, which may be crucially important and not available elsewhere, can be difficult for monolingual English speakers to effectively access. In this paper we present SARAL, an end-to-end cross-lingual information retrieval (CL1R) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed. The SARAL system achieved the top end-to-end performance in the most recent IARPA MATERIAL CLIR+summarization evaluations.
展开▼
机译:随着电子媒体的日益民主化,大量的信息资源以斯瓦希里语或索马里语等较不流行的语言提供。对于说英语的说英语的人来说,可能至关重要的信息可能是至关重要的,而其他地方无法获得。在本文中,我们介绍了SARAL,这是一种针对低资源语言的端到端跨语言信息检索(CL1R)和汇总系统,该系统1)使英语使用者能够使用英语查询来搜索文本和音频的外语存储库,2)概述了针对特定信息需求的英文检索文档,并且3)根据需要提供完整的转录和翻译。 SARAL系统在最新的IARPA MATERIAL CLIR +摘要评估中获得了最高的端到端性能。
展开▼