首页> 外文会议>International Conference on Mining Intelligence and Knowledge Exploration >Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents
【24h】

Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents

机译:HINDI和PUNJABI文档的多语言摘要混合算法

获取原文

摘要

This paper concentrates on hybrid algorithm for multilingual summarization of Hindi and Punjabi documents. It combines the features of Hindi summarizer as suggested by CDAC Noida and Punjabi summarizer as suggested by Gupta and Lehal in 2012. In addition to this, it also suggests some new features for summarizing Hindi and Punjabi multilingual text. It is first time that this multilingual text summarizer has been proposed which supports both Hindi and Punjabi text. Nine features used in this algorithm for summarizing multilingual Hindi and Punjabi text are: 1) Key phrase extraction 2) Font feature 3) Nouns and Verbs Extraction 4) Position feature 5) Cue-phrase feature 6) Negative keywords extraction 7) Named Entities extraction 8) Relative length feature 9) extraction of number data. For each sentence, scores of each feature is calculated and then machine learning based mathematical regression is applied for identifying weights of these nine features. Sentence final-scores are calculated from feature weight equations. Top scored sentences in proper order (in same order as in input) are selected for final summary. Default summary is made at 30% compression ratio. This algorithm performs well at 30% compression ratio for both intrinsic and extrinsic measures of summary evaluation. This algorithm has been thoroughly tested on 30 Hindi-Punjabi documents and reports F-Score equal to 92.56% which is reasonably good.
机译:本文专注于杂交算法,以杂交算法,用于旁观码和旁遮普文献的多语言概述。它结合了印度摘要的特征,如2012年的Gupta和Lehal所建议的CDAC Noida和Punjabi Summarizer所建议的。除此之外,它还表明了一些新的功能,总结了印地语和旁遮普多语言文本。这是第一次提出了这种多语言文本摘要,它支持印地语和旁遮普文本。该算法中使用的九种特征总结了多语言印地语和旁遮普文本:1)关键短语提取2)字体特征3)名词和动词提取4)位置特征5)提示短语特征6)负关键字提取7)命名实体提取8)相对长度特征9)数字数据的提取。对于每个句子,计算每个特征的分数,然后基于机器学习的数学回归用于识别这些九个特征的权重。句子最终分数由特征权重方程计算。选择最终摘要的适当顺序(与输入相同的订单)的句子。默认摘要以30%的压缩比为。该算法对于概要评估的内在和外在措施的内在和外在措施的压缩比率为30%的压缩比。该算法在30个Hindi-Punjabi文件上进行了彻底测试,并报告F评分等于92.56%,这是合理的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号