首页> 外文学位 >Proteomics data interoperation with applications to integrated data mining and enhanced information retrieval.
【24h】

Proteomics data interoperation with applications to integrated data mining and enhanced information retrieval.

机译:蛋白质组学数据与集成数据挖掘和增强信息检索应用程序的互操作。

获取原文
获取原文并翻译 | 示例

摘要

This thesis addresses the problem of data integration and interoperation of large-scale, widely distributed and independently maintained data, focusing on biological proteomics data which exemplifies the problem and has a practical need for better interoperation, and shows how such integrated data can be leveraged for important applications such as detailed cross-database queries in support of scientific exploratory data analysis and enhanced information retrieval. Semantic web RDF and RDF databases, which fit the problem well, are used to build two biological data integration systems called YeastHub and LinkHub. YeastHub is a lightweight semantic web data warehouse of joined RDF-formatted biological (yeast) data and allows RDF query access to it. LinkHub focuses on a high-level structuring principal or "scaffold" for biological data, storing biological identifiers (e.g. for proteins, genes, etc.) and the complex relationships among them as a large RDF directed labeled graph; LinkHub is used through web interactive and query interfaces and also complements YeastHub. Through several nontrivial RDF queries of the joined YeastHub and LinkHub data, we demonstrate that practical integrated biological data analysis can be achieved by basic, lightweight methods which don't attempt to solve the complete integration problem.; A key focus of the LinkHub system is support for enhanced information retrieval of web documents and articles from the biomedical scientific literature (PubMed). We attach documents to identifier nodes in the LinkHub RDF graph and provide for the flexible retrieval of the documents through queries of the RDF graph's relational structure. In addition, we use the LinkHub RDF relational data and attached documents as training sets to construct classifiers for document relevance ranking in support of enhanced automated information retrieval of web or biomedical scientific literature documents related to biological identifiers. The results of experiments done to empirically measure the performance of this enhanced automated information retrieval for proteomics (UniProt) identifier-related documents through the use of a manually curated bibliography of yeast protein-specific literature citations are presented.
机译:本论文解决了大规模,分布广泛且独立维护的数据的数据集成和互操作问题,重点关注了生物蛋白质组学数据,该问题例证了这一问题,并且在实际中需要更好的互操作,并说明了如何利用这些集成数据来实现以下目的:重要的应用程序,例如详细的跨数据库查询,以支持科学探索性数据分析和增强的信息检索。很好地适合该问题的语义Web RDF和RDF数据库用于构建两个名为YeastHub和LinkHub的生物数据集成系统。 YeastHub是一个轻量级的语义Web数据仓库,其中包含已加入RDF格式的生物(酵母)数据,并允许RDF查询对其进行访问。 LinkHub专注于生物数据的高级结构主体或“支架”,将生物标识符(例如蛋白质,基因等)及其之间的复杂关系存储为大型RDF定向标记图;通过网络交互和查询界面使用LinkHub,它也是YeastHub的补充。通过对加入的YeastHub和LinkHub数据的几个非平凡的RDF查询,我们证明了可以通过基本的,轻量级的方法来实现实用的集成生物数据分析,而这些方法不会试图解决完整的集成问题。 LinkHub系统的重点是支持从生物医学科学文献(PubMed)中增强对Web文档和文章的信息检索。我们将文档附加到LinkHub RDF图中的标识符节点,并通过查询RDF图的关系结构来灵活地检索文档。此外,我们使用LinkHub RDF关系数据和附加文档作为训练集来构建文档相关性排名的分类器,以支持与生物学标识符相关的Web或生物医学科学文献文档的增强的自动化信息检索。介绍了通过使用人工整理的参考书目对酵母蛋白质特有文献的引用,以经验方式测量蛋白质组学(UniProt)标识符相关文件的增强自动信息检索性能的实验结果。

著录项

  • 作者

    Smith, Andrew Kendall.;

  • 作者单位

    Yale University.;

  • 授予单位 Yale University.;
  • 学科 Biology Bioinformatics.; Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 192 p.
  • 总页数 192
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号