首页> 外文会议>ACM conference on digital libraries >Scalable Collection Summarization and Selection
【24h】

Scalable Collection Summarization and Selection

机译:可扩展的集合摘要和选择

获取原文

摘要

Information retrieval over the Internet increasingly requires the filtering of thousands of information sources. As the number and variety of sources increases, new ways of automatically summarizing, discovering, and selecting sources relevant to a user's query are needed. Pharos is a highly scalable distributed architecture for locating heterogeneous information sources. Its design is hierarchical, thus allowing it to scale well as the number of information sources increases. We demonstrate the feasibility of the Pharos architecture using 2500 Usenet newsgroups as separate collections. Each news-group is summarized via automated Library of Congress classification. We show that using Pharos as an intermediate retrieval mechanism provides acceptable accuracy of source selection compared to selecting sources using complete classification information, while maintaining good scalability. This implies that hierarchical distributed metadata and automated classification are potentially useful paradigms to address scalability problems in large-scale distributed information retrieval applications.
机译:信息检索在互联网上越来越需要过滤数千个信息源。随着源的数量和各种来说,需要自动总结,发现和选择与用户查询相关的新方法。 Pharos是一种高度可扩展的分布式架构,用于定位异构信息源。其设计是分层的,因此随着信息源的数量增加,允许它逐渐扩展。我们展示了使用2500 Usenet新闻组作为单独的集合的Pharos架构的可行性。每个新闻集团通过国会分类自动化图书馆汇总。我们表明,与使用完整分类信息的选择源相比,使用Pharos作为中间检索机制提供可接受的源选择精度,同时保持良好的可扩展性。这意味着分层分布式元数据和自动分类是可能有用的范例,以解决大规模分布式信息检索应用中的可扩展性问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号