首页> 外文期刊>Journal of Computational Intelligence and Electronic Systems >Proposing an Extractive Mono-Document Summarization System for Persian Language
【24h】

Proposing an Extractive Mono-Document Summarization System for Persian Language

机译:提出波斯语提取性单文档摘要系统

获取原文
获取原文并翻译 | 示例
       

摘要

With the rapid increase in the amount of online text information, it became more important to have tools that would help users distinguish the important content. Automatic text summarization attempts to address this problem by taking an input text and extracting the most important content of it. However, the determination of the salience of information in the text depends on different factors and remains as a key problem of automatic text summarization. In the literature, there are some studies that use lexical chains as an indicator of lexical cohesion in the text and as an intermediate representation for text summarization. Also, some studies make use of genetic algorithms in order to examine some manually generated summaries and learn the patterns in the text which lead to the summaries by identifying relevant features which are most correlated with human generated summaries. In this study, we combine these two approaches of summarization. Firstly, some of preprocessing operations like normalizer, tokenizer, stop word remover, stemmer, and POS tagger are done on the text. After that for each sentence we have only semantic words that are independent. Then, by set of position, thematic, and coherence features we score sentences. The final score of each sentence will be the integration of those features. Each feature has its own weight and should be identified to have well summary. For this reason first system goes throw learning phase to determine ache feature weight by genetic algorithm. The next phase is testing phase. In this phase system receives new documents and uses Persian WordNet and lexical chains to extract deep level of knowledge about the text. This knowledge is combined with other higher level analysis results. Finally, sentences are scored, sorted, and selected and summary is made. We evaluated our proposed system by two methods. (1) Precision/recall, (2) TabEval (a new evaluation tool for Persian text summarizers). We compared our system with two other Persian summarizers (FarsiSum, Ijaz). Results showed that our system had higher performance rather than others (i.e., higher precision/recall average and the best average score of TabEval).
机译:随着在线文本信息量的迅速增加,拥有可帮助用户区分重要内容的工具变得越来越重要。自动文本摘要尝试通过获取输入文本并提取其中最重要的内容来解决此问题。但是,确定文本中信息的显着性取决于不同的因素,并且仍然是自动文本摘要的关键问题。在文献中,有一些研究使用词法链作为文本中词法衔接的指标,并作为文本摘要的中间表示。而且,一些研究利用遗传算法来检查一些手动生成的摘要,并通过识别与人类生成的摘要最相关的相关特征来学习导致摘要的文本模式。在这项研究中,我们结合了这两种总结方法。首先,对文本执行一些预处理操作,例如规范化程序,令牌化程序,停用词删除程序,词干分析程序和POS标记程序。在那之后,对于每个句子,我们只有独立的语义词。然后,通过位置,主题和连贯性特征集对句子评分。每个句子的最终分数将是这些功能的综合。每个功能都有其自己的权重,应加以标识以具有良好的总结。因此,第一个系统进入抛出学习阶段,通过遗传算法确定疼痛特征权重。下一个阶段是测试阶段。在此阶段,系统接收新文档,并使用波斯语WordNet和词汇链来提取有关文本的深入知识。这些知识与其他更高级别的分析结果结合在一起。最后,对句子进行评分,排序和选择,然后进行总结。我们通过两种方法评估了我们提出的系统。 (1)精度/召回率,(2)TabEval(一种用于波斯语文本摘要程序的新评估工具)。我们将我们的系统与其他两个波斯汇总器(FarsiSum,Ijaz)进行了比较。结果表明,我们的系统比其他系统具有更高的性能(例如,更高的精度/召回率平均值和TabEval的最佳平均得分)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号