首页> 外文会议>International Conference of Artificial Intelligence and Information Technology >Information Retrieval System for Searching JSON Files with Vector Space Model Method
【24h】

Information Retrieval System for Searching JSON Files with Vector Space Model Method

机译:使用矢量空间模型方法搜索JSON文件的信息检索系统

获取原文
获取外文期刊封面目录资料

摘要

The purpose of this research is to build a retrieval system for searching files or documents in JSON or JavaScript Notation format. The source of JSON files that searched comes from the generated journal files that have been extracted from the title and the keywords. This retrieval system is designed using the Vector Space Model method to obtain optimal searching results by looking for similarity of query words with the documents sought. This system is made with Java programming based object-oriented so that it can be implemented on many platforms and operating systems. This system reads all the words in the JSON file, then filters it for the stopword list to remove it. Then the stemming process is carried out to find the basic word. Stemming results are indexed to store the number of word frequencies in each document, the indexing results are stored in memory to speed up the searching. When a user searches, the system will search for indexing results based on query words, then weighting TF-IDF each word searched and calculated similarity with the Cosine Similarity formula with related documents. All related documents will be sorted descending based on the scoring value of the similarity. The word search process in this system more or less takes 150-300 milliseconds in JSON documents which amount to 140.
机译:本研究的目的是构建用于以JSON或JavaScript符号格式搜索文件或文档的检索系统。搜索的JSON文件源来自已从标题和关键字中提取的生成的日记文件。该检索系统使用矢量空间模型方法设计,通过寻找所寻求的文档来获取查询单词的相似性来获得最佳搜索结果。该系统由基于Java编程的面向对象编程进行,以便它可以在许多平台和操作系统上实现。此系统读取JSON文件中的所有单词,然后将其筛选为停止列表以删除它。然后执行茎干的过程以找到基本字。 STEMMING结果被索引以存储每个文档中的字频率的数量,索引结果存储在内存中以加速搜索。当用户搜索时,系统将根据查询词搜索索引结果,然后对每个单词加权TF-IDF与相关文档的余弦相似公式进行搜索和计算相似度。所有相关文件都将根据相似性的评分值对所有相关文件进行排序。此系统中的单词搜索过程或多或少的json文档中的150-300毫秒额为140。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号