首页> 外文会议>International Conference on Mathematics: Pure, Applied and Computation >Implementation of The Common Phrase Index Method on The Phrase Query for Information Retrieval
【24h】

Implementation of The Common Phrase Index Method on The Phrase Query for Information Retrieval

机译:关于信息检索的短语查询上的公共词组索引方法的实现

获取原文

摘要

As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and Fmeasure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.
机译:作为技术的发展,查找新闻文本信息的过程很容易,因为新闻的文本不仅在打印媒体中分发,例如报纸,还可以使用搜索引擎访问的电子媒体。在查找搜索引擎上的相关文档的过程中,通常用作查询的短语。构成短语查询的单词数量显然会影响所产生的文档的相关性。结果,所获得的信息的准确性将受到影响。基于概述的问题,本研究的目的是分析关于信息检索的共同短语索引方法的实施。本研究将在英文新闻文本中进行,并在原型中实施,以确定所产生的文件的相关性水平。系统采用预处理,索引,术语加权计算和余弦相似性计算的阶段构建。然后,系统将基于余弦相似性在序列中显示文档搜索结果。此外,将使用100个文档和20个查询进行系统测试。然后将结果用于评估阶段。首先,使用kappa统计计算确定相关文件。其次,使用精度,召回和F测量计算来确定系统成功率。在这项研究中,Kappa统计计算的结果为0.71,因此相关文件有资格获得系统评估。然后计算精度,召回和F测量,产生0.37的精度,召回0.50,令人恢复为0.43。从这个结果可以说,系统产生相关文件的成功率低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号