...
首页> 外文期刊>Computer Science & Information Technology >An Effective Tokenization Algorithm for Information Retrieval Systems
【24h】

An Effective Tokenization Algorithm for Information Retrieval Systems

机译:一种有效的信息检索系统令牌化算法

获取原文
           

摘要

In the web, amount of operational data has been increasing exponentially from past fewdecades, the expectations of data-user is changing proportionally as well. The data-user expectsmore deep, exact, and detailed results. Retrieval of relevant results is always affected by thepattern, how they are stored/ indexed. There are various techniques are designed to indexed thedocuments, which is done on the token’s identified with in documents. Tokenization process,primarily effective is to identifying the token and their count. In this paper, we have proposed aneffective tokenization approach which is based on training vector and result shows thatefficiency/ effectiveness of proposed algorithm.Tokenization of a given documents helps tosatisfy user’s information need more precisely and reduced search sharply, is believed to be apart of information retrieval. Tokenization involves pre-processing of documents and generatesits respective tokens which is the basis of these tokens probabilistic IR generate its scoring andgives reduced search space. No of Token generated is the parameters used for result analysis.
机译:在网络中,过去几十年来,运营数据的数量呈指数增长,数据用户的期望也在成比例地变化。数据用户期望获得更深入,准确和详细的结果。相关结果的检索始终受模式及其存储/索引方式的影响。设计了多种技术来对文档建立索引,这是通过在文档中标识的令牌来完成的。令牌化过程最有效的是识别令牌及其数量。本文基于训练向量提出了一种有效的分词方法,结果表明了所提算法的效率/有效性。给定文档的分词可以帮助用户更准确地满足用户的信息需求,并大大减少了搜索量,被认为是信息的一部分。恢复。令牌化涉及文档的预处理,并生成其相应的令牌,这是这些令牌的基础,概率IR生成评分并减少搜索空间。生成的令牌数量不是用于结果分析的参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号