首页> 外文会议>WASE Global Conference on Science Engineering >Design and Implementation of Full-Text Retrieval System for People's Daily Annotated Corpus
【24h】

Design and Implementation of Full-Text Retrieval System for People's Daily Annotated Corpus

机译:用于人民日报语料库的全文检索系统的设计与实现

获取原文

摘要

In this paper, we have designed and realized a efficient full-text retrieval system for the basic annotation People's Daily Corpus based on the inverted index technology. According to the characteristics of the basic annotation People's Daily Corpus data, we have analyzed the methods and strategies of system implementing thoroughly. On the basis of comparing the various schemes, we have put forward to the three levels index structure of Chinese character, word and address set, and given the design approach of each level index dictionary structure. After converting the unstructured People's Daily corpus into index structured data, we realized the full-text search algorithm correspond to the proposed index structure. Experimental results show that the proposed search algorithm has achieved the target of "ten millions Chinese characters, response in a second", improved the speed of the People's Daily Corpus full-text search.
机译:在本文中,我们设计并实现了基于倒指数技术的基本注释人员每日语料库的高效全文检索系统。根据基本注释人民每日语料库数据的特点,我们分析了彻底实施系统的方法和策略。在比较各种方案的基础上,我们提出了汉字,单词和地址集的三级指数结构,并给出了每个级别索引词典结构的设计方法。将非结构化人员每日语料库转换为索引结构数据后,我们意识到全文搜索算法对应于所提出的索引结构。实验结果表明,该拟议的搜索算法已经实现了“十百万汉字,第二次响应”的目标,提高了人民日常语料库全文搜索的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号