首页> 外文期刊>The Plant Genome >Non‐homology‐based prediction of gene functions in maize (Zea mays ssp. mays)
【24h】

Non‐homology‐based prediction of gene functions in maize (Zea mays ssp. mays)

机译:基于非同源性的基因函数预测玉米(ZEA Mays SSP。梅斯)

获取原文
           

摘要

Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non‐homology gene features. Among the eight supervised classification algorithms evaluated, random‐forest‐based prediction consistently provided the most accurate gene function prediction. Non‐homology‐based functional annotation provides complementary strengths to homology‐based annotation, with higher average performance in Biological Process GO terms, the domain where homology‐based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology‐based functional annotation is highest. GO prediction models trained with homology‐based annotations were able to successfully predict annotations from a manually curated “gold standard” GO annotation set. Non‐homology‐based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology‐based functional annotations.
机译:基因组测序和注释的进展使得难以识别新的基因序列。预测这些新发现基因的功能仍然具有挑战性。常见的祖先序列中降临的基因可能具有共同的功能。结果,同源性广泛用于基因功能预测。这意味着功能性注释误差也从一个物种传播到另一个物种。评估了基于机器学习分类算法的几种方法,以便他们准确地预测来自非同源基因特征的基因功能的能力。在评估的八个监督分类算法中,基于随机林的预测一致地提供了最准确的基因函数预测。基于非同源性的功能注释为同源性的注释提供了互补的优势,在生物过程中的平均性能较高,域名的域,其中同源性的功能注释在分子函数下表现最差,并且在分子函数下的性能较弱,域名基于同源的功能注释的准确性最高。使用基于同源性的注释培训的GO预测模型能够从手动策划的“金标准”GO注释集中成功地预测注释。基于机器学习的非同源性的功能注释最终可以证明是将预测功能分配给缺乏功能表征同源物的孤立基因的方法,并识别通过基于同源性的功能注释传播的功能注释误差。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号