首页>
外国专利>
Method and apparatus using Bayesian subfamily identification for sequence analysis
Method and apparatus using Bayesian subfamily identification for sequence analysis
展开▼
机译:使用贝叶斯亚族鉴定进行序列分析的方法和装置
展开▼
页面导航
摘要
著录项
相似文献
摘要
An system and methodology procedure agglomeratively estimates a phylogenetic tree from MSA input data by creating a data model represented by each tree node by first estimating the number of independent observations in the data. A preferably relative entropy distance measurement made among nodes between subtrees determines which nodes in the model to merge at each agglomeration step. Cuts in the phylogenetic tree are made at points in the agglomeration at which minimized encoding cost is determined, preferably by using Dirichlet mixture densities to assign probabilities to observed amino acids within each subfamily at each position. Using subtree data, a statistical model, e.g., a profile or hidden Markov model, for each subfamily may be constructed in a position-dependent manner, which permits identifying remote homologs in a database search. Further, the invention provides an alignment analysis to identify key functional or structural residues. Finally, the invention may be carried out in automated fashion using a computer system in which a processor unit executes a storable routine embodying the preferred methodology.
展开▼