首页> 外文期刊>International Journal of Computers & Applications >An ontology learning based approach for focused web crawling using combined normalized pointwise mutual information and Resnik algorithm
【24h】

An ontology learning based approach for focused web crawling using combined normalized pointwise mutual information and Resnik algorithm

机译:An ontology learning based approach for focused web crawling using combined normalized pointwise mutual information and Resnik algorithm

获取原文
获取原文并翻译 | 示例
       

摘要

The precedence of unexplored Uniform Resource Locators (URLs) is calculated in many existing works based on a linear combination of similarities of different texts of the web_page and the specified topic along with their associated weights. These weights, however, are chosen based on various methodologies like Term Frequency-Inverse Document Frequency (TF-IDF), so these weights can immediately create severe deviations from the priorities of unvisited web pages and also it will calulate the similarity only if the word occurs in the web page. It won't consider the semantic similarity of the word in the web page. To overcome the troubles mentioned above, this article presents a new focused web crawler based on combined Normalized Pointwise Mutual Information (NPMI) and Resnik based semantic similarity algorithm, called as P-crawler. In the P-crawler, the records of an unexplored web page are made up of web page text, anchor text, title text, bold text and heading text of the web page. The experimental findings show that the suggested algorithm increases focused on crawler efficiency. In conclusion, the above technique is efficient and promising for focused web crawlers.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号