首页> 外文会议>Transactions on large - scale data - and knowledge centered systems III. >Integrating Large and Distributed Life Sciences Resources for Systems Biology Research: Progress and New Challenges
【24h】

Integrating Large and Distributed Life Sciences Resources for Systems Biology Research: Progress and New Challenges

机译:整合大型和分布式生命科学资源进行系统生物学研究:进展和新挑战

获取原文
获取原文并翻译 | 示例

摘要

Researchers in Systems Biology routinely access vast collection of hidden web research resources freely available on the internet. These collections include online data repositories, online and downloadable data analysis tools, publications, text mining systems, visualization artifacts, etc. Almost always, these resources have complex data formats that are heterogeneous in representation, data type, interpretation and even identity. They are often forced to develop analysis pipelines and data management applications that involve extensive and prohibitive manual interactions. Such approaches act as a barrier for optimal use of these resources and thus impede the progress of research. In this paper, we discuss our experience of building a new middleware approach to data and application integration for Systems Biology that leverages recent developments in schema matching, wrapper generation, workflow management, and query language design. In this approach, ad hoc integration of arbitrary resources and computational pipeline construction using a declarative language is advocated. We highlight the features and advantages of this new data management system, called LifeDB, and its query language BioFlow. Based on our experience, we highlight the new challenges it raises, and potential solutions to meet these new research issues toward a viable platform for large scale autonomous data integration. We believe the research issues we raise have general interest in the autonomous data integration community and will be applicable equally to research unrelated to LifeDB.
机译:系统生物学的研究人员通常会定期访问互联网上免费提供的大量隐藏的网络研究资源。这些集合包括在线数据存储库,在线和可下载的数据分析工具,出版物,文本挖掘系统,可视化工件等。几乎总是,这些资源具有复杂的数据格式,这些格式在表示,数据类型,解释甚至身份上都是异构的。他们经常被迫开发涉及大量和禁止的手动交互的分析管道和数据管理应用程序。这样的方法成为最佳利用这些资源的障碍,从而阻碍了研究的进展。在本文中,我们讨论了为System Biology构建新的数据和应用程序集成中间件方法的经验,该方法利用了模式匹配,包装器生成,工作流管理和查询语言设计方面的最新发展。在这种方法中,提倡任意资源的临时集成和使用声明性语言的计算流水线构建。我们重点介绍了称为LifeDB的新数据管理系统及其查询语言BioFlow的功能和优势。基于我们的经验,我们重点介绍了它提出的新挑战,以及为实现可行的大规模自治数据集成平台而解决这些新研究问题的潜在解决方案。我们认为,我们提出的研究问题对自治数据集成社区普遍感兴趣,并且将同样适用于与LifeDB不相关的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号