首页> 外文期刊>Mobile networks & applications >System Design of Cloud Search Engine Based on Rich Text Content
【24h】

System Design of Cloud Search Engine Based on Rich Text Content

机译:基于富文本内容的云搜索引擎系统设计

获取原文
获取原文并翻译 | 示例
           

摘要

In order to improve the search performance of rich text content, a cloud search engine system based on rich text content is designed. On the basis of traditional search engine hardware system, several hardware devices such as Solr index server, collector, Chinese word segmentation device and searcher are installed, and the data interface is adjusted. On the basis of hardware equipment and database support, this paper uses the open source Apache Tika framework to obtain the metadata of rich text documents, implements word segmentation according to the rich text content and semantics, and calculates the weight of each keyword. Input search keywords, establish a text index, use BM25 algorithm to calculate the similarity between keywords and text, and output the search results of rich text according to the similarity calculation results. The experimental results show that the design system has high recall rate, high throughput, and the construction time of each data item index in different files is short, which improves the search efficiency and search accuracy.
机译:为了提高丰富的文本内容的搜索性能,设计了一种基于丰富文本内容的云搜索引擎系统。在传统的搜索引擎硬件系统的基础上,安装了多个硬件设备,如Solr Index Server,Collector,Chinese Segsation Device和Searcher,并调整数据接口。在硬件设备和数据库支持的基础上,本文使用开源Apache Tika Framework获取丰富文本文档的元数据,根据丰富的文本内容和语义来实现单词分段,并计算每个关键字的权重。输入搜索关键字,建立文本索引,使用BM25算法计算关键字和文本之间的相似性,并根据相似性计算结果输出丰富文本的搜索结果。实验结果表明,设计系统具有高召回率,高吞吐量,以及不同文件中的每个数据项索引的施工时间短,这提高了搜索效率和搜索精度。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号