首页> 外军国防科技报告 >imPhy: Imputing Phylogenetic Trees with Missing Information using Mathematical Programming
【2h】

imPhy: Imputing Phylogenetic Trees with Missing Information using Mathematical Programming

机译:imPhy:使用数学编程用丢失的信息插补系统发育树

代理获取
代理获取并翻译 | 示例

摘要

The advances of modern genomics allow researchers to apply phylogenetic analyses on a genome-wide scale. While large volumes of genomic data can be generated cheaply and quickly, data missingness is a non-trivial and somewhat expected problem. Since the available information is often incomplete for a given set of genetic loci and individual organisms, a large proportion of trees that depict the evolutionary history of a single genetic locus, called gene trees, fail to contain all individuals. Data incompleteness causes difficulties in data collection, information extraction, and gene tree inference. Furthermore, identifying outlying gene trees, which can represent horizontal gene transfers, gene duplications, or hybridizations, is difficult when data is missing from the gene trees. The typical approach is to remove all individuals with missing data from the gene trees, and focus the analysis on individuals whose information is fully available – a huge loss of information. In this work, we propose and design an optimization-based imputation approach to infer the missing distances between leaves in a set of gene trees via a mixed integer non-linear programming model. We also present a new research pipeline, imPhy, that can (i) simulate a set of gene trees with leaves randomly missing in each tree, (ii) impute the missing pairwise distances in each gene tree, (iii) reconstruct the gene trees using the Neighbor Joining (NJ) and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) methods, and (iv) analyze and report the efficiency of the reconstruction. To impute the missing leaves, we employ our newly proposed non-linear programming framework, and demonstrate its capability in reconstructing gene trees with incomplete information in both simulated and empirical datasets. In the empirical datasets apicomplexa and lungfish, our imputation has very small normalized mean square errors, even in the extreme case where 50% of the individuals in each gene tree are missing. Data, software, and user manuals can be found at https://github.com/yasuiniko/imPhy.

著录项

  • 作者

    Yasui, Niko;

  • 作者单位
  • 年(卷),期 2020(),
  • 年度 2020
  • 页码
  • 总页数 11
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 网站名称 美国海军研究生院图书馆
  • 栏目名称 所有文件
  • 关键词

  • 入库时间 2022-08-19 17:02:04
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号