首页> 外文会议>International Conference on Engineering and Emerging Technologies >ISUTD: Intelligent System for Urdu Text De-Summarization
【24h】

ISUTD: Intelligent System for Urdu Text De-Summarization

机译:ISUTD:核心核心文本缩减摘要的智能系统

获取原文

摘要

Text De-Summarization is a method of increasing the document and explains the substantial point of the text. It is very rough assignment for humans to manually explain the central subject from the large article. De- Summarization can be separating into two branches as Abstractive and Extractive approaches. Extractive accumulates the imperative paragraph or sentence from the original document and presents them as an explanation. Urdu inherits a lot of vocabulary from Arabic, Persian and the native languages of South Asia. Due to this effect, Urdu has a complex morphology. In terms of syntax, it has a relatively free word order (Subject, Object, and Verb). Despite spoken by millions of people, Urdu is an under-resourced language in terms of available computational resources. We extent the single document extractive de-summarization methodology for Urdu based on the sentence weight algorithm especially for the news, sports, and health etc. topics. We encapsulate the manuscript by preprocessing (sentence segmentation, tokenization, stop words and lemmatization) and apply sentence weight algorithm.
机译:文本失败是一种增加文档的方法,并解释了文本的实质性点。人类是非常粗略的分配,用于手动解释大型文章的中心科目。扩展可以分为两个分支,作为抽象和提取方法。 Extractic累积了原始文件中的命令段落或句子,并将其作为解释。乌尔都语继承了来自阿拉伯语,波斯和南亚母语的大量词汇。由于这种效果,乌尔都语具有复杂的形态。在语法方面,它具有相对自由的单词顺序(主题,对象和动词)。尽管达到了数百万人,但乌尔都语是一种资源不足的语言,就可用的计算资源而言。基于句子权重算法,我们为乌尔都语的单一文档提取除序方法特别适用于新闻,体育和健康等主题。我们通过预处理(句子分割,标记化,停止单词和lemmatization)来封装稿件并应用句子权重算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号