首页> 外文会议>International conference on database and expert systems applications >Solving Data Mismatches in Bioinformatics Workflows by Generating Data Converters
【24h】

Solving Data Mismatches in Bioinformatics Workflows by Generating Data Converters

机译:通过生成数据转换器来解决生物信息学工作流程中的数据不匹配问题

获取原文

摘要

Heterogeneity of data and data formats in bioinformatics entail mismatches between inputs and outputs of different services, making it difficult to compose them into workflows. To reduce those mismatches, bioinformatics platforms propose ad'hoc converters, called shims. When shims are written by hand, they are time-consuming to develop, and cannot anticipate all needs. When shims are automatically generated, they miss transformations, for example data composition from multiple parts, or parallel conversion of list elements. This article proposes to systematically detect convertibility from output types to input types. Convertibility detection relies on a rule system based on abstract types, close to XML Schema. Types allow to abstract data while precisely accounting for their composite structure. Detection is accompanied by an automatic generation of converters between input and output XML data. We show the applicability of our approach by abstracting concrete bioinformatics types (e.g., complex biosequences) for a number of bioinformatics services (e.g., blast). We illustrate how our automatically generated converters help to resolve data mismatches when composing workflows. We conducted an experiment on bioinformatics services and datatypes, using an implementation of our approach, as well as a survey with domain experts. The detected convertibilities and produced converters were validated as relevant from a biological point of view. Furthermore the automatically produced graph of potentially compatible services exhibited a connectivity higher than with the ad'hoc approaches. Indeed, the experts discovered unknown possible connexions.
机译:生物信息学中数据和数据格式的异质性导致不同服务的输入和输出之间不匹配,从而难以将它们组合到工作流中。为了减少这些不匹配,生物信息学平台提出了称为“垫片”的临时转换器。用手工书写垫片时,它们会很耗时,并且无法预期所有需求。自动生成垫片时,它们会错过转换,例如多个部分的数据合成或列表元素的并行转换。本文建议系统地检测从输出类型到输入类型的可转换性。可转换性检测依赖于基于抽象类型(接近XML模式)的规则系统。类型允许抽象数据,同时精确考虑其复合结构。检测伴随着在输入和输出XML数据之间自动生成转换器。我们通过为许多生物信息学服务(例如blast)抽象出具体的生物信息学类型(例如复杂的生物序列)来证明我们方法的适用性。我们将说明自动生成的转换器如何在组成工作流时帮助解决数据不匹配的问题。我们使用我们方法的实施方法,对生物信息学服务和数据类型进行了实验,并与领域专家进行了调查。从生物学的角度来看,检测到的可兑换性和生产的转化器被确认为相关。此外,自动生成的潜在兼容服务图显示出比自组织方法更高的连通性。实际上,专家们发现了未知的可能的联系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号