首页> 外文会议>ICAA 2014 >Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization
【24h】

Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization

机译:太长了 - 没有阅读:一种基于Web的文本摘要方法

获取原文

摘要

In today’s digital epoch, people share and read a motley of never ending electronic information, thus either a lot of time is wasted in deciphering all this information, or only a tiny amount of it is actually read. Therefore, it is imperative to contrive a generic text summarization technique. In this paper, we propose a web based and domain independent automatic text summarization method. The method focuses on generating an arbitrary length summary by extracting and assigning scores to semantically important information from the document, by analyzing term frequencies and tagging certain parts of speech like proper nouns and signal words. Another important characteristic of our approach is that it also takes font semantics of the text (like headings and emphasized texts) into consideration while scoring different entities of the document.
机译:在今天的数字时代,人们共享并阅读永无止境电子信息的杂色,因此在解密所有这些信息时浪费了大量时间,或者只是实际读取的微小量。因此,它必须涉及到通用文本摘要技术。在本文中,我们提出了一种基于Web和域的独立自动文本摘要方法。该方法通过分析术语频率和标记特定名词和信号字的语音标记的术语频率并标记语音和信号字的某些部分来侧重于通过从文档中提取和分配到语义重要信息来生成任意长度概要。我们方法的另一个重要特征是,在评分文档的不同实体时,它还考虑了文本(如标题和强调文本)的字体语义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号