首页> 外文OA文献 >A Query Suggestion Workflow for Life Science IR-Systems
【2h】

A Query Suggestion Workflow for Life Science IR-Systems

机译:生命科学IR系统的查询建议工作流程

摘要

Information Retrieval (IR) plays a central role in the exploration and interpretation of integrated biological datasets that represent the heterogeneous ecosystem of life sciences. Here, keyword based query systems are popular user interfaces. In turn, to a large extend, the used query phrases determine the quality of the search result and the effort a scientist has to invest for query refinement. In this context, computer aided query expansion and suggestion is one of the most challenging tasks for life science information systems. Existing query front-ends support aspects like spelling correction, query refinement or query expansion. However, the majority of the front-ends only make limited use of enhanced IR algorithms to implement comprehensive and computer aided query refinement workflows. In this work, we present the design of a multi-stage query suggestion workflow and its implementation in the life science IR system LAILAPS. The presented workflow includes enhanced tokenisation, word breaking, spelling correction, query expansion and query suggestion ranking. A spelling correction benchmark with 5,401 queries and manually selected use cases for query expansion demonstrate the performance of the implemented workflow and its advantages compared with state-of-the-art systems.
机译:信息检索(IR)在探索和解释代表生命科学异质生态系统的综合生物学数据集方面发挥着核心作用。在此,基于关键字的查询系统是流行的用户界面。反过来,在很大程度上,所使用的查询短语确定搜索结果的质量以及科学家为进行查询优化而必须投入的精力。在这种情况下,计算机辅助查询的扩展和建议是生命科学信息系统最具挑战性的任务之一。现有的查询前端支持诸如拼写校正,查询优化或查询扩展之类的方面。但是,大多数前端仅有限地使用增强的IR算法来实现全面的计算机辅助查询优化工作流。在这项工作中,我们介绍了多阶段查询建议工作流的设计及其在生命科学IR系统LAILAPS中的实现。提出的工作流程包括增强的标记化,分词,拼写校正,查询扩展和查询建议排名。具有5,401个查询的拼写校正基准以及用于查询扩展的手动选择的用例证明了已实现的工作流程的性能及其与最新系统相比的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号