Cross-lingual training of summarization systems using annotated corpora in a foreign language

Marina Litvak; Mark Last

首页> 外文期刊>Information Retrieval >Cross-lingual training of summarization systems using annotated corpora in a foreign language

【24h】

Cross-lingual training of summarization systems using annotated corpora in a foreign language

机译：使用外语带注释语料库的汇总系统的跨语言培训

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increasing trend of cross-border globalization and acculturation requires text summarization techniques to work equally well for multiple languages. However, only some of the automated summarization methods can be defined as “language-independent,” i.e., not based on any language-specific knowledge. Such methods can be used for multilingual summarization, defined in Mani (Automatic summarization. Natural language processing. John Benjamins Publishing Company, Amsterdam, 2001) as “processing several languages, with a summary in the same language as input”, but, their performance is usually unsatisfactory due to the exclusion of language-specific knowledge. Moreover, supervised machine learning approaches need training corpora in multiple languages that are usually unavailable for rare languages, and their creation is a very expensive and labor-intensive process. In this article, we describe cross-lingual methods for training an extractive single-document text summarizer called MUSE (MUltilingual Sentence Extractor)—a supervised approach, based on the linear optimization of a rich set of sentence ranking measures using a Genetic Algorithm. We evaluated MUSE’s performance on documents in three different languages: English, Hebrew, and Arabic using several training scenarios. The summarization quality was measured using ROUGE-1 and ROUGE-2 Recall metrics. The results of the extensive comparative analysis showed that the performance of MUSE was better than that of the best known multilingual approach (TextRank) in all three languages. Moreover, our experimental results suggest that using the same sentence ranking model across languages results in a reasonable summarization quality, while saving considerable annotation efforts for the end-user. On the other hand, using parallel corpora generated by machine translation tools may improve the performance of a MUSE model trained on a foreign language. Comparative evaluation of an alternative optimization technique—Multiple Linear Regression—justifies the use of a Genetic Algorithm.

机译：跨境全球化和文化融合的日益增长的趋势要求文本摘要技术对多种语言同样有效。但是，仅某些自动汇总方法可以定义为“与语言无关”，即不基于任何特定于语言的知识。这样的方法可用于多语言汇总，在Mani（自动汇总。自然语言处理。JohnBenjamins Publishing Company，阿姆斯特丹，2001年）中定义为“处理多种语言，并使用与输入相同的语言进行汇总”，但是它们的性能很高。由于排除了特定于语言的知识，因此通常不能令人满意。此外，受监督的机器学习方法需要使用多种语言训练语料库，而稀有语言通常无法使用这种语言，并且它们的创建是非常昂贵且劳动密集型的过程。在本文中，我们描述了一种跨语言方法，用于训练一种称为MUSE（多语言句子提取器）的抽取式单文档文本摘要生成器，这是一种受监督的方法，该方法基于使用遗传算法对一组丰富的句子排名度量进行线性优化。我们使用几种培训方案评估了MUSE在三种不同语言的文档上的性能：英语，希伯来语和阿拉伯语。汇总质量使用ROUGE-1和ROUGE-2 Recall指标进行测量。广泛的比较分析结果表明，在所有三种语言中，MUSE的性能均比最著名的多语言方法（TextRank）的性能更好。此外，我们的实验结果表明，跨语言使用相同的句子排名模型可产生合理的摘要质量，同时为最终用户节省大量注释工作。另一方面，使用由机器翻译工具生成的并行语料库可以提高使用外语训练的MUSE模型的性能。对替代性优化技术（多重线性回归）的比较评估证明了遗传算法的合理使用。

著录项

来源
《Information Retrieval》 |2013年第5期|629-656|共28页
作者
Marina Litvak; Mark Last;
展开▼
作者单位

Sami Shamoon Academic College of Engineering">(1);

Ben Gurion University of the Negev">(2);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multilingual summarization; Genetic Algorithm; Cross-lingual training;

机译：多语言摘要;遗传算法跨语言培训;

相似文献

外文文献
中文文献
专利

1. Cross-lingual training of summarization systems using annotated corpora in a foreign language [J] . Marina Litvak, Mark Last Information retrieval . 2013,第5期

机译：使用外语带注释语料库的汇总系统的跨语言培训
2. Tactical Language and Culture Training Systems: Using AI to Teach Foreign Languages and Cultures [J] . Andre Valente, W. Lewis Johnson AI magazine . 2009,第2期

机译：战术语言和文化培训系统：使用AI教外语和文化
3. Tactical Language and Culture Training Systems: Using AI to Teach Foreign Languages and Cultures [J] . W. Lewis Johnson, Andre Valente AI Magazine . 2009,第2期

机译：战术语言和文化培训系统：使用AI教外语和文化
4. Tactical Language and Culture Training Systems: Using Artificial Intelligence to Teach Foreign Languages and Cultures [C] . W. Lewis Johnson, Andre Valente AAAI Conference on Artificial Intelligence . 2008

机译：战术语言文化培训系统：使用人工智能教导外语和文化
5. Effects of using corpora and online reference tools on foreign language writing: A study of Korean learners of English as a second language. [D] . Koo, Kyosung. 2006

机译：使用语料库和在线参考工具对外语写作的影响：以英语为第二语言的韩国学习者的研究。
6. Foreign Language Learning as Cognitive Training to Prevent Old Age Disorders? Protocol of a Randomized Controlled Trial of Language Training vs. Musical Training and Social Interaction in Elderly With Subjective Cognitive Decline [O] . Saskia E. Nijmeijer, Marie-José van Tol, André Aleman, 2021

机译：外语学习作为认知培训以防止老年障碍？语言培训的随机对照试验的协议与主体认知下降的老年人的音乐训练和社会互动
7. Tactical Language and Culture Training Systems: using AI to teach foreign languages and cultures [O] . W. Lewis Johnson, Andre Valente 2009

机译：战术语言和文化培训系统：使用人工智能教授外语和文化
8. Method for Determing Language Objectives and Criteria. Volume I. A Communication/Language Objectives-Based System (C/LOBS) for Foreign Language Training. [R] . Setzler, H. H., Trabert, J. A., Chow, C., 1979

机译：确定语言目标和标准的方法。第一卷。外语培训的基于通信/语言目标的系统（C / LOBs）。

Cross-lingual training of summarization systems using annotated corpora in a foreign language

摘要

著录项

相似文献

相关主题

期刊订阅