首页> 外文会议>International Conference on Information Retrieval and Knowledge Management >Automatic Text Summarization for Malay News Documents Using Latent Dirichlet Allocation and Sentence Selection Algorithm
【24h】

Automatic Text Summarization for Malay News Documents Using Latent Dirichlet Allocation and Sentence Selection Algorithm

机译:使用潜在Dirichlet分配和句子选择算法的马来新闻文档的自动文本摘要

获取原文

摘要

The proliferation of internet newspapers making an Automatic Text Summarization is now a need to produce a summary that contains most of the important information from the original document. This study focused on the keyword extraction using Latent Dirichlet Allocation and Sentence Selection that used rule based concept approach to produce extractive summary. 100 Malay news documents covering general, sports, health and technology were collected from Utusan Online to evaluate the effectiveness of the system. This study used a single topic from LDA and top 10 words in the selected topic as the keywords. To evaluate, summary generated by the system was compared to summary generated by human expert using Precision Recall formula. The results showed the effectiveness of the summary generated by the system which is the best score 62.7 % that can help people read the Malay news documents in short time as the summary assist the readers to understand the important parts of the document without reading the whole document.
机译:互联网报纸的增殖使自动文本摘要现在需要生成包含原始文档中大多数重要信息的摘要。本研究专注于使用基于规则的概念方法的潜在Dirichlet分配和句子选择的关键字提取来产生提取摘要。从Utusan Online收集了100个马来新闻文件,涵盖了一般,体育,健康和技术,以评估系统的有效性。本研究使用了从LDA和所选主题中的10个单词的单个主题作为关键字。为了评估,将系统生成的摘要与人类专家使用精密召回公式进行了比较。结果表明,系统产生的摘要的有效性是最佳分数62.7%,可以帮助人们在短时间内阅读马来新闻文件,因为摘要帮助读者了解文档的重要部分而不阅读整个文件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号