首页> 外文期刊>Neurocomputing >A novel focused crawler based on cell-like membrane computing optimization algorithm
【24h】

A novel focused crawler based on cell-like membrane computing optimization algorithm

机译:基于细胞样膜计算优化算法的新型聚焦履带

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In many research works, topical priorities of unvisited hyperlinks are computed based on linearly integrating topic-relevant similarities of various texts and corresponding weighted factors. However, these weighted factors are determined based on the personal experience, so that these values may make topical priorities of unvisited hyperlinks serious deviations directly. To solve this problem, this paper proposes a novel focused crawler applying the cell-like membrane computing optimization algorithm (CMCFC). The CMCFC regards all weighted factors corresponding to contribution degrees of similarities of various texts as one object, and utilizes evolution regulars and communication regulars in membranes to achieve the optimal object corresponding to the optimal weighted factors, which make the root measure square error (RMS) of priorities of hyperlinks achieve the minimum. Then, it linearly integrates optimal weighted factors and corresponding topical similarities of various texts, which are computed by using a Vector Space Model (VSM), to compute priorities of unvisited hyperlinks. The CMCFC obtains more accurate unvisited URLs' priorities to guide crawlers to collect higher quality web pages. The experimental results indicate that the proposed method improves the performance of focused crawlers by intelligently determining weighted factors. In conclusion, the mentioned approach is effective and significant for focused crawlers.
机译:在许多研究工作中,未访问超链接的主题优先级是根据各种文本与主题相关的相似性以及相应的加权因子进行线性积分得出的。但是,这些加权因子是根据个人经验确定的,因此这些值可能会使未访问的超链接的主题优先级直接产生严重偏差。为了解决这个问题,本文提出了一种新颖的,采用细胞样膜计算优化算法(CMCFC)的集中履带。 CMCFC将与各个文本的相似性贡献度相对应的所有加权因子视为一个对象,并利用膜中的进化规则和通信规则来实现与最佳加权因子相对应的最佳对象,从而使平方根平方误差(RMS)得以提高。超链接的优先级达到最低。然后,它使用向量空间模型(VSM)对最佳加权因子和各个文本的相应主题相似度进行线性集成,以计算未访问超链接的优先级。 CMCFC获取更准确的未访问URL优先级,以引导搜寻器收集更高质量的网页。实验结果表明,该方法通过智能地确定加权因子,提高了重点履带的性能。总之,所提到的方法对于专注的爬虫是有效的,并且具有重要意义。

著录项

  • 来源
    《Neurocomputing》 |2014年第10期|266-280|共15页
  • 作者

    WenJun Liu; YaJun Du;

  • 作者单位

    School of Mathematics and Computer Engineering, Xihua University, Chengdu 610039, China;

    School of Mathematics and Computer Engineering, Xihua University, Chengdu 610039, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Focused crawler; Membrane computing; Optimization algorithm; VSM;

    机译:专注的爬虫;膜计算;优化算法;VSM;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号