首页> 外文期刊>Chemical geology >Graph-Based Extractive Arabic Text Summarization Using Multiple Morphological Analyzers
【24h】

Graph-Based Extractive Arabic Text Summarization Using Multiple Morphological Analyzers

机译:基于图形的抽取阿拉伯文文本摘要使用多种形态分析仪

获取原文
获取原文并翻译 | 示例
       

摘要

This paper investigates the effectiveness of using multi-morphological analysis for improving the performance of graph-based approach for extractive Arabic text summarization (ATS). This approach represents the text-document as a graph in which; sentences are the graph nodes and the relationships between the sentences are edges' weights of the graph. These weights measure the similarity between the relevant sentences which traditionally calculated using the cosine similarity on the basis of term frequency-inverse document frequency (TF-IDF). The performance of graph-based ATS is still low because calculating these weights are very challenging for Arabic language due to the following reasons: complex morphological structure of Arabic language, absence of capital letters and diacritics, and the change of the order of the words on the sentence. In this study, the summation of the cosine similarity and mutual nouns between the connected sentences is chosen as measure to represent the edges' weights. Nouns were chosen because, the more nouns in the sentence the more information is, thus we assume that using nouns lead to an improvement in the final summary. To overcome Arabic language limitations when calculating the proposed measure, it is required to investigate the impact of using different morphological analyzers for extracting nouns from each sentence on ATS accuracy. Three morphological analyzers algorithms are proposed to enhance the performance of graphbased ATS system. These algorithms are: BAMA, Safar Alkhalil and Stanford NLP. Firstly, graph-based ATS system was constructed the input of this system is text-document and the output are summary. Then redundant sentences were removed according to sentences overlapping criteria. To evaluate the impact of different morphological on the proposed summarization approach, EASC corpus is used as a standard dataset. The results show that Safar Alkhalil morphological analyzer gives the best performance among the three proposed analyzers.
机译:本文研究了利用多形态学分析来提高基于图形的方法的性能的效果(ATS)。此方法表示文本文档作为其中的图形;句子是图形节点,句子之间的关系是图形的边缘权重。这些权重测量传统上使用余弦相似性的相关句子之间的相似性,基于术语频率反转文档频率(TF-IDF)。基于图形的ATS的性能仍然很低,因为由于以下原因,计算这些重量对于阿拉伯语而言是非常具有挑战性的:阿拉伯语的复杂形态结构,缺乏大写字母和变形物,以及单词秩序的变化这句话。在这项研究中,选择连接句子之间的余弦相似性和相互名词的总和作为表示边缘的权重。选择名词是因为,句子中越名的信息越多,我们认为使用名词导致最终摘要的改进。为了克服计算所提出的措施时克服阿拉伯语限制,需要调查使用不同形态学分析仪从每个句子提取名词的影响。提出了三种形态分析仪算法,提高了石绘ATS系统的性能。这些算法是:Bama,Safar Alkhalil和Stanford NLP。首先,基于图形的ATS系统构造了该系统的输入是文档和输出摘要。然后根据句子重叠标准删除冗余句子。为了评估不同形态的影响,对所提出的摘要方法,EASC语料库用作标准数据集。结果表明,Safar Alkhalil形态学分析仪在三种提出的分析仪之间提供了最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号