...
首页> 外文期刊>Physica, A. Statistical mechanics and its applications >Extractive summarization using complex networks and syntactic dependency
【24h】

Extractive summarization using complex networks and syntactic dependency

机译:使用复杂网络和句法依存关系的提取摘要

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general.
机译:统计物理学方法可以用于分析表示为复杂网络的书面文本的认识导致自然语言处理方面的一些发展,包括自动摘要和机器翻译评估。最重要的是,到目前为止,仅使用了一些复杂网络的指标,因此,随着创建了网络拓扑和动态性的新度量,有足够的机会来增强基于统计的方法。在本文中,我们首次采用了中介性,脆弱性和多样性之间的度量标准来分析巴西葡萄牙语中的书面文本。与基于复杂网络的先前工作相比,使用基于多样性指标的策略,可以实现更好的自动汇总性能。使用优化的方法,Rouge得分(用于汇总的自动评估方法)为0.5089,这是使用基于复杂数据的巴西葡萄牙语统计方法的提取摘要器所获得的最高价值。此外,多样性指标可以高精度检测关键字,因此我们认为该关键字适合产生良好的摘要。还表明,通过语法分析器并入语言知识确实可以提高自动汇总器的性能,这与预期的一样,但是Rouge分数的提高只是很小的一部分。这些结果增强了复杂网络方法尤其适用于改进自动摘要程序以及一般情况下处理文本的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号