首页> 中文期刊> 《计算机应用研究》 >数字图书馆主题搜索引擎的设计与实现

数字图书馆主题搜索引擎的设计与实现

         

摘要

提出构建数字图书馆主题搜索引擎的总体系统设计.利用一个预处理系统尽量选择高质量的种子站点,从而产生Web主题定义数据;在系统控制器的协调下,各主题爬行器同步地采集爬行器所推荐的Web资源,对下载的资源进行文本分类与主题识别;将已经下载的Web资源按学科分类存储在Web主题资源库中,通过全局信息库建立索引,接入通用接口进行依主题检索.依赖数字图书馆各方面特点,提出支持多线程主题爬行器的设计,并提出一种新颖的URL主题相关性剪切算法EPR,为实现数字图书馆主题搜索引擎原型提供重要的设计.基于开源Lucene平台进行系统扩展而形成最终系统,实验结果表明该工作是相当有效的,尤其是提出的相关性判别算法EPR,具有相当的创新性和实际应用价值.%This paper advanced the total system design for topic-specific search engine of digital library.It made use of a pretreatment system to select the seed station with high quality, thus giving Web topic defined data. Every topic crawler collected synchronistically Web resource recommended by crawlers with regulation of system controller,then classified text and identified topic in download resource, which was stored into Web topic resource database according to discipline classification.Others could search the topic resource through the index of whole information database.According to every specially characterist of digital library,this paper brang up the design for topic-specific crawler of multi-thread, and gave anovel URL pruning algorithm-EPR,for the design to realize topic-specific search engine prototype of digital library. Lucene-based open-source platform for the expansion of the system and the formation of the final system,the experiment results show that the research work of this article is effective,especially in EPR algorithm, which are really creative and valuable in real application environment.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号