【24h】

Automatic Annotation of Corpora for Text Summarisation: A Comparative Study

机译:用于文本摘要的语料库自动注释:一项比较研究

获取原文
获取原文并翻译 | 示例

摘要

This paper presents two methods which automatically produce annotated corpora for text summarisation on the basis of human produced abstracts. Both methods identify a set of sentences from the document which conveys the information in the human produced abstract best. The first method relies on a greedy algorithm, whilst the second one uses a genetic algorithm. The methods allow to specify the number of sentences to be annotated, which constitutes an advantage over the existing methods. Comparison between the two approaches investigated here revealed that the genetic algorithm is appropriate in cases where the number of sentences to be annotated is less than the number of sentences in an ideal gold standard with no length restrictions, whereas the greedy algorithm should be used in other cases.
机译:本文提出了两种方法,它们可以在人工产生的摘要的基础上自动生成带注释的语料库,用于文本摘要。两种方法都从文档中识别出一组句子,以最佳方式传达信息。第一种方法依靠贪婪算法,而第二种方法则使用遗传算法。该方法允许指定要注释的句子的数量,这构成了优于现有方法的优点。此处研究的两种方法的比较表明,在需要注释的句子数量少于没有长度限制的理想黄金标准中的句子数量的情况下,遗传算法是合适的,而其他算法则应使用贪婪算法。案件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号