Unique function words characterize genomic proteins

机译：独特的功能词表征基因组蛋白

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of “words” or UFWs (57% shared), the “sentences” (MDAs) are different (1.3% shared).

机译：在2009年至2016年之间，已知物种的蛋白质序列数量从800万增加到8500万，增长了10倍。这些序列中约80％包含至少一个被保守域结构检索工具（CDART）识别为序列基序的区域。母题提供了生物学功能的线索，但CDART经常通过两个或多个配置文件匹配蛋白质的同一区域。这种同义词使功能复杂性的估计复杂化。我们通过找到最大不相交的派系来对冗余配置文件进行全链接聚类：每个聚类被单个代表配置文件代替，以提供所谓的唯一功能词（UFW）。从2009年到2016年，CDART使用的序列图谱数量增加了80％； UFW的数量增加的速度更慢30％，这表明UFW的数量可能会饱和。单个UFW（具有单域结构的序列）匹配的序列数与不同单词的数量一样缓慢地增加，而具有多个域结构（MDA）的序列中两个或多个UFW的组合所匹配的序列数则增加以与序列总数相同的速率。 MDA中有限数量的UFW的这种组合排列说明了蛋白质序列的基因组多样性。尽管真核生物和原核生物使用非常相似的“单词”或“超高频”集（共有57％），但“句子”（MDA）却有所不同（共有1.3％）。

著录项

期刊名称 Proceedings of the National Academy of Sciences of the United States of America
作者
Andrea Scaiewicz; Michael Levitt;
展开▼
作者单位

展开▼
年(卷),期 2018(115),26
年度 2018
页码 6703–6708
总页数 6
原文格式 PDF
正文语种
中图分类
关键词
protein universe genomic sequences functional profiles domain architecture shared function;

机译：蛋白质宇宙;基因组序列;功能谱;结构域结构;共享功能;

相似文献

外文文献
中文文献
专利

1. COMBREX Seeks To Bridge Genomics-Protein Function Data Gap: Reducing the gap between sequence data and information about protein functions will help in characterizing microbial diversity [J] . Buckley M.R. Microbe: the news magazine of the American Society for Microbiology . 2011,第8期

机译：COMBREX试图弥合基因组学与蛋白质功能数据之间的鸿沟：缩小序列数据与蛋白质功能信息之间的鸿沟将有助于表征微生物多样性
2. Characterizing the Functions of Structural Genomics Proteins through Computed Chemical Properties and Biochemical Validation [J] . Mills Caitlyn, Parasuram Ramya, Beuning Penny, Protein Science: A Publication of the Protein Society . 2017,第Suppla1期

机译：通过计算化学性质和生物化学验证来表征结构基因组学蛋白的功能
3. Utilizing computational and experimental chemistry to characterize the functions of Structural Genomics proteins. [J] . Mills Caitlyn, Beuning Penny J., Ondrechen Mary Jo Protein Science: A Publication of the Protein Society . 2016,第Suppla1期

机译：利用计算化学和实验化学来表征结构基因组蛋白的功能。
4. An infrastructure for comparative genomics to functionally characterize genes and proteins. [C] . Suter Crazzolara C, Kurapkat G Workshop on Genome Informatics . 2000

机译：对比较基因组学的基础设施，以功能性表征基因和蛋白质。
5. Genetic and functional genomic approaches to characterize regulatory proteins in Saccharomyces cerevisiae. [D] . Youn, Ji-Young. 2013

机译：遗传和功能基因组学方法来表征酿酒酵母中的调节蛋白。
6. The unique architecture and function of cellulose-interacting proteins in oomycetes revealed by genomic and structural analyses [O] . Mathieu Larroque, Roland Barriot, Arnaud Bottin, 2012

机译：基因组和结构分析揭示卵菌中纤维素相互作用蛋白的独特结构和功能
7. The unique architecture and function of cellulose-interacting proteins in oomycetes revealed by genomic and structural analyses [O] . Mathieu Larroque, Roland Barriot, Arnaud Bottin, 2012

机译：基因组和结构分析揭示卵菌中纤维素相互作用蛋白的独特结构和功能

Unique function words characterize genomic proteins

摘要

著录项

相似文献

相关主题

期刊订阅