...
首页> 外文期刊>BMC Bioinformatics >Development and tuning of an original search engine for patent libraries in medicinal chemistry
【24h】

Development and tuning of an original search engine for patent libraries in medicinal chemistry

机译:开发和调整用于药物化学专利库的原始搜索引擎

获取原文
           

摘要

BackgroundThe large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which intensively uses patent libraries for competitive intelligence and drug development.MethodsWe describe here the development of an advanced retrieval engine to search information in patent collections in the field of medicinal chemistry. We investigate and combine different strategies and evaluate their respective impact on the performance of the search engine applied to various search tasks, which covers the putatively most frequent search behaviours of intellectual property officers in medical chemistry: 1) a prior art search task; 2) a technical survey task; and 3) a variant of the technical survey task, sometimes called known-item search task, where a single patent is targeted.ResultsThe optimal tuning of our engine resulted in a top-precision of 6.76% for the prior art search task, 23.28% for the technical survey task and 46.02% for the variant of the technical survey task. We observed that co-citation boosting was an appropriate strategy to improve prior art search tasks, while IPC classification of queries was improving retrieval effectiveness for technical survey tasks. Surprisingly, the use of the full body of the patent was always detrimental for search effectiveness. It was also observed that normalizing biomedical entities using curated dictionaries had simply no impact on the search tasks we evaluate. The search engine was finally implemented as a web-application within Novartis Pharma. The application is briefly described in the report.ConclusionsWe have presented the development of a search engine dedicated to patent search, based on state of the art methods applied to patent corpora. We have shown that a proper tuning of the system to adapt to the various search tasks clearly increases the effectiveness of the system. We conclude that different search tasks demand different information retrieval engines' settings in order to yield optimal end-user retrieval.
机译:背景技术专利馆藏数量的大量增加导致了对有效搜索策略的需求。但是,专门针对生物医学领域专利的高级文本挖掘应用程序的开发仍然很少见,特别是为了满足制药和生物技术行业的需求,该行业大量使用专利库来进行竞争情报和药物开发。一个先进的检索引擎,用于搜索药物化学领域专利文献中的信息。我们研究并组合了不同的策略,并评估了它们各自对搜索引擎性能的影响,这些搜索引擎适用于各种搜索任务,其中涵盖了医学化学领域知识产权人员可能最频繁的搜索行为:1)现有技术搜索任务; 2)技术调查任务; 3)技术调查任务的一种变体,有时也称为已知项目搜索任务,其目标是一项专利。结果我们引擎的最佳调整导致现有技术搜索任务的最高精确度为6.76%,即23.28%用于技术调查任务,46.02%用于技术调查任务。我们观察到,提高合作引用率是改善现有技术搜索任务的适当策略,而IPC查询分类正在提高技术调查任务的检索效率。出乎意料的是,使用专利的全部内容始终不利于搜索效果。还观察到,使用精选词典对生物医学实体进行标准化对我们评估的搜索任务完全没有影响。该搜索引擎最终在Novartis Pharma中实现为Web应用程序。报告中对该应用程序进行了简要描述。结论我们基于专利语料库的最新方法,介绍了致力于专利检索的搜索引擎的开发。我们已经表明,对系统进行适当的调整以适应各种搜索任务显然会提高系统的效率。我们得出结论,不同的搜索任务需要不同的信息检索引擎的设置,以便产生最佳的最终用户检索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号