首页> 外文会议>Protein Engineering Summit. >ProtaBank: A Comprehensive Database for Protein Engineering and Design
【24h】

ProtaBank: A Comprehensive Database for Protein Engineering and Design

机译:Protabank:蛋白质工程和设计的综合数据库

获取原文

摘要

Recent advances in gene synthesis, microfluidics, deep sequencing, and microarray techniques have made it possible to construct and assay large libraries of variant protein sequences. This rapid generation of large sets of mutational data has significantly enhanced researchers' ability to study how proteins function and to engineer proteins with new and improved properties. Although many groups around the world are currently generating large amounts of protein engineering data, there is no standardized format to report this data and no simple mechanism for groups to share the data that they generate. We have developed ProtaBank, a comprehensive database for protein engineering data where users can store their data as well as query and analyze data submitted by themselves and others. ProtaBank stores the data in a relational database using a standardized schema that requires full protein sequence information and detailed assay descriptions. These features allow for accurate comparison of measurements made across different proteins and by different groups. ProtaBank is comprehensive in that it accepts data for several different protein properties, including those related to stability, folding, activity, and binding. ProtaBank thus provides a central repository for data that is often scattered across many different specialized databases. ProtaBank features a web interface and REST API that streamlines data deposition and allows for batch input and queries. A suite of analysis tools are provided to allow for discovery and analysis of relationships between mutated sequences. We demonstrate the importance of a standardized format for reporting protein engineering data that allows for accurate comparisons between different data sets and enables future data mining and machine learning approaches to be applied.
机译:基因合成,微流体,深度测序和微阵列技术的最新进展使得可以构建和测定大型变体蛋白序列文库。这种大量的突变数据的快速产生显着增强了研究人员研究蛋白质如何功能以及具有新的和改进性质的工程蛋白质的能力。虽然世界各地的群体目前正在产生大量的蛋白质工程数据,但没有标准化的格式来报告此数据,而且没有单组的简单机制共享它们生成的数据。我们开发了Protabank,一个全面的蛋白质工程数据数据库,用户可以存储他们的数据以及查询和分析自己和他人提交的数据。 Protabank使用需要完整蛋白质序列信息和详细测定描述的标准化模式将数据存储在关系数据库中。这些特征允许准确地比较不同蛋白质和不同组的测量。 Protabank是全面的,因为它接受了几种不同的蛋白质特性的数据,包括与稳定性,折叠,活动和结合有关的数据。因此,Protabank为往往跨越许多不同专业数据库的数据提供中央存储库。 Protabank具有Web界面和REST API,可简化数据沉积并允许批处理输入和查询。提供了一套分析工具,以允许发现和分析突变序列之间的关系。我们展示了一种标准化格式的重要性,用于报告蛋白质工程数据,该数据允许在不同数据集之间准确比较,并且能够应用未来的数据挖掘和机器学习方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号