首页> 外国专利> METHOD FOR EXTRACTING SUBJECT AND SORTING DOCUMENT OF SEARCHING ENGINE, COMPUTER READABLE RECORD MEDIUM ON WHICH PROGRAM FOR EXECUTING METHOD IS RECORDED

METHOD FOR EXTRACTING SUBJECT AND SORTING DOCUMENT OF SEARCHING ENGINE, COMPUTER READABLE RECORD MEDIUM ON WHICH PROGRAM FOR EXECUTING METHOD IS RECORDED

机译:提取搜索引擎的主题和排序文件的方法,记录了执行方法的程序上的计算机可读记录介质

摘要

A method for extracting subjects and sorting documents in a search engine, and a computer-readable recording medium storing a program thereof are provided to enable a user to access desired information conveniently/quickly by selecting atypical/various subjects not classified in a manual mode, classify the target documents into each subject, and determine whether the searched document is suitable for the subject. A relation degree representing that respective keywords are selected at the same time is measured for the keywords included in target documents. A convergence relation degree between a word set about the predetermined keyword and the word set related to other keywords is measured. The keyword is selected as a subject when the convergence relation degree is higher than a specific value. A naive Bayesian probability is calculated by performing naive Bayesian training for training documents and each keyword included in the target documents. A vector size of each keyword included in the training and target document is calculated. A distance between the vector size of each keyword of the training and target document is calculated. Similarity of each keyword is calculated by multiplying the naive Bayesian probability and the distance. A ranking value is calculated by processing the similarity of each keyword included in the target document.
机译:提供一种用于在搜索引擎中提取主题和对文档进行排序的方法,以及一种存储其程序的计算机可读记录介质,以使用户能够通过选择未分类为手动模式的非典型/各种主题来方便/快捷地访问所需信息,将目标文档分类为每个主题,并确定搜索的文档是否适合该主题。对于目标文档中包括的关键词,测量表示同时选择各个关键词的关联度。测量关于预定关键词的单词集和与其他关键词相关的单词集之间的收敛关系程度。当收敛关系度高于特定值时,选择关键词作为主题。通过对训练文档和目标文档中包含的每个关键字执行朴素贝叶斯训练,可以计算出朴素贝叶斯概率。计算包含在训练和目标文档中的每个关键字的向量大小。计算训练中每个关键词的矢量大小和目标文档之间的距离。通过将朴素的贝叶斯概率与距离相乘,可以计算出每个关键字的相似度。通过处理目标文档中包括的每个关键字的相似度来计算排名值。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号