【24h】

Topic generation for web document summarization

机译:用于Web文档摘要的主题生成

获取原文

摘要

Over the past decade, more and more users of the Internet rely on the search engines to help them find the information they need. However, the information they find depends, to a large extent, on the ranking mechanism of the search engines they use. Not surprisingly, it, in general, consists of a large amount of information that is completely irrelevant. To help users of the Internet find the information they are looking for quickly, an efficient algorithm for building the summaries of a collection of documents found by a search engine in response to a user query, called DISCO (Distribution Scoring) is proposed. To demonstrate the performance of the proposed algorithm, Reuters-21578 text categorization collection and some search results from Google are used in our simulation. Moreover, several metrics such as coverage, overlap, and the computation time are employed in evaluating the quality and quantity of the proposed algorithm. All our simulation results indicate that the proposed algorithm outperforms all the existing algorithms in terms of not only the usefulness of the summaries but also the running time.
机译:在过去的十年中,越来越多的Internet用户依靠搜索引擎来帮助他们找到所需的信息。但是,他们找到的信息在很大程度上取决于他们使用的搜索引擎的排名机制。毫不奇怪,它通常由大量完全不相关的信息组成。为了帮助Internet用户快速找到他们要查找的信息,提出了一种有效的算法,该算法用于构建搜索引擎响应用户查询而找到的文档集合的摘要,称为DISCO(Distribution Scoring)。为了演示所提出算法的性能,我们在模拟中使用了Reuters-21578文本分类集合和Google的一些搜索结果。此外,在评估所提出算法的质量和数量时,采用了诸如覆盖率,重叠度和计算时间之类的几个指标。我们所有的仿真结果都表明,所提出的算法不仅在摘要的有用性方面而且在运行时间方面都优于所有现有算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号