首页> 外文期刊>International journal of biomedical engineering and technology >Critical review of various near-duplicate detection methods in web crawl and their prospective application in drug discovery
【24h】

Critical review of various near-duplicate detection methods in web crawl and their prospective application in drug discovery

机译:Web爬网中各种接近重复检测方法及其在药物发现中的前瞻性应用的关键综述

获取原文
获取原文并翻译 | 示例
       

摘要

For near-duplicate detection, various methods available in the literature are compared in terms of their application, utility, and context. In most of the cases the performances are highlighted so that anyone interested in choosing an algorithm can find this useful. Moreover, certain futuristic algorithms like oblique and streaming random forest are reported which will help the researcher to develop new algorithms especially suitable for Big Data and cloud environment. The coverage is not exhaustive but, nevertheless, considers all important algorithms used in practice so that any practitioner can find it handy to take implementation decision. As application case study application of random forest approach to near-duplicate detection is used in Chinese herbal drug discovery application is proposed.
机译:对于近重复检测,在其应用程序,实用程序和上下文方面比较文献中可用的各种方法。 在大多数情况下,突出显示表演,以便有兴趣选择算法的任何人都可以找到这种有用的。 此外,报告了某些未来派算法,如倾斜和流动森林,这将有助于研究人员开发新的算法,特别适用于大数据和云环境。 覆盖范围并非详尽无遗,但是,考虑到实践中使用的所有重要算法,以便任何从业者都可以找到易于执行实施决定的。 作为应用案例研究在中草药发现应用中使用随机森林方法对近重复检测的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号