首页> 美国卫生研究院文献>other >Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data
【2h】

Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data

机译:使用分布式查询跨非常大的生物数据集进行知识和主题发现:结合非结构化和结构化数据的原型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.
机译:随着生物医学科学学科继续应用能够产生空前数量的嘈杂和复杂生物数据的新技术,很明显,从此类数据中获取有意义的信息的可用方法根本无法跟上步伐。为了获得有用的结果,研究人员需要有效,有效地合并,存储和查询结构化和非结构化数据集组合的方法。随着我们朝着个性化医学的方向发展,迫切需要将非结构化数据(例如医学文献)与大量高度结构化和高通量的数据(例如来自非常大的人群的人类变异或表达数据)相结合。对于我们的研究,我们使用Hadoop框架调查了可能的生物医学查询。我们使用我们开发的本地MapReduce工具以及其他开源和专有工具来运行查询。我们的结果表明,大数据领域内的可用技术可以减少在生命科学领域的实际临床应用中对大型数据集利用和应用分布式查询所需的时间和精力。本文讨论的方法和技术为进行更详细的评估奠定了基础,该评估旨在研究如何将各种数据结构和数据模型最佳地映射到适当的计算框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号