首页> 外文期刊>Pattern recognition letters >Keyword weight optimization using gradient strategies in event focused web crawling
【24h】

Keyword weight optimization using gradient strategies in event focused web crawling

机译:关键词权重优化在活动中使用渐变策略的重点策略

获取原文
获取原文并翻译 | 示例

摘要

At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data. (c) 2020 Elsevier B.V. All rights reserved.
机译:目前,感受到了一种用于获得关于关键事件的Web数据的集成事件聚焦爬网系统。在灾难或任何其他重要事件时,有几个用户尝试查找有关事件的更新信息。这项工作提出了一种新的和高效的方法,可用于此类关键字设置增强。今天,信息一直在迅速增长,任何搜索引擎都可能非常具有挑战性,以便正确检索必要的信息。 Web爬网程序是这种搜索引擎的主要单位,为此,它们的优化可能是提高搜索效率的主要方面。 Web信息和连续文档和数据更新的大尺寸和主动性质称为基于Web的检索系统。这种聚焦的爬网方法专注于用于确定网页的自动网页分类。虽然各种分类器用于确定网页,但是关键字的识别在改善集中的Web爬网的事件方面发挥着重要作用。拟议的工作具有以下新颖有效的方法,可用于此类关键词集增强。发现基于的核心型优化关键字权重有效。基于频率(TF)的特征提取和使用随机梯度下降(SGD)算法的关键字权重优化在聚焦的Web爬网中采用。梯度下降是实现优化的流行算法,随机算法的优点是健身功能中的子微分和可差化的光滑度,非常适合大数据优化。该算法专注于使关键字设置最佳,并且在发现关键字集更好的情况下,返回的结果文档可能与用户的查询更相关。为此,采用支持向量机(SVM)分类器。实验结果证明,建议的技术优于其他技术,包括基于粒子群优化(PSO)的重量优化解决方案。与PSO相比,所提出的SGD重量优化比5.8%更好,显示了检查高卷数据的能力。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号