Obtaining maximal concatenated phylogenetic data sets from large sequence databases

Sanderson MJ; Driskell AC; Ree RHEulenstein OLangley S

首页> 外文期刊>Molecular biology and evolution >Obtaining maximal concatenated phylogenetic data sets from large sequence databases

【24h】

Obtaining maximal concatenated phylogenetic data sets from large sequence databases

机译：Obtaining maximal concatenated phylogenetic data sets from large sequence databases

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

To improve the accuracy of tree reconstruction, phylogeneticists are extracting increasingly large multigene data sets from sequence databases. Determining whether a database contains at least k genes sampled from at least m species is an NP-complete problem. However, the skewed distribution of sequences in these databases permits all such data sets to be obtained in reasonable computing times even for large numbers of sequences. We developed an exact algorithm for obtaining the largest multigene data sets from a collection of sequences. The algorithm was then tested on a set of 100,000 protein sequences of green plants and used to identify the largest multigene ortholog data sets having at least 3 genes and 6 species. The distribution of sizes of these data sets forms a hollow curve, and the largest are surprisingly small, ranging from 62 genes by 6 species, to 3 genes by 65 species, with more symmetrical data sets of around 15 taxa by 15 genes. These upper bounds to sequence concatenation have important implications for building the tree of life from large sequence databases.

著录项

来源
《Molecular biology and evolution》 |2003年第7期|1036-1042|共7页
作者
Sanderson MJ; Driskell AC; Ree RHEulenstein OLangley S;
展开▼
作者单位

Sanderson MJ, Univ Calif Davis, Sect Evolut & Ecol, Davis, CA 95616, USA;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类分子生物学;
关键词
Biclique; Np-complete; Sequence concatenation; Phylogeny; Inferring complex phylogenies; Placental mammals; Molecular; phylogenetics; Missing data; Trees; Genes; Angiosperms; Parsimony; Information; Supertrees;

Obtaining maximal concatenated phylogenetic data sets from large sequence databases

摘要

著录项

相关主题

期刊订阅