首页> 外文学位 >Computational mutagenesis using transduction, active learning, and association rule mining.
【24h】

Computational mutagenesis using transduction, active learning, and association rule mining.

机译:使用转导,主动学习和关联规则挖掘进行计算诱变。

获取原文
获取原文并翻译 | 示例

摘要

Wet laboratory mutagenesis to determine enzyme mutant activity or nsSNP-induced pathology is expensive and time consuming. Automating such prediction tasks motivates in silico computational methods, i.e., computational mutagenesis. The computational methods used in this dissertation are driven by transduction, active learning, and association mining. The specific bioinformatics tasks are linked with the novel computational mutagenesis methods as follows: (1) protein function prediction using transduction; (2) protein function prediction using transduction and active learning; and (3) prediction of nsSNP-induced pathology using transduction and active learning combined with association mining. The feasibility and comparative advantage of these methods are shown on predicting mutant (single amino acid polymorphisms) activity for HIV-1 Protease (HIV-1), Bacteriophage T4 Lysozyme (T4), and Lac Repressor (LAC) proteins; and on predicting non-synonymous Single Nucleotide Polymorphism (nsSNP)-induced pathology on an nsSNP data set composed of a large number of proteins. The problem of unbalanced population, where the proportion of examples in the data set belonging to each class is uneven, is addressed using (a) stratified sampling with cross-validation operating on folds that are identical in class distribution; and (b) random over-sampling to boost the minority class and make it equal in size to the majority class. The annotation problem is a by-product of incremental transduction and active learning. The novel methods proposed in this dissertation perform better than state-of-the-art methods in terms of prediction performance (Tasks 1, 2, and 3), amount of annotation used (size of training data) (Tasks 2 and 3), and explanation (knowledge) gained (Task 3).
机译:湿实验室诱变来确定酶突变活性或nsSNP诱导的病理学既昂贵又耗时。使这样的预测任务自动化将激励计算机计算方法,即计算诱变。本文所采用的计算方法是由转导,主动学习和关联挖掘驱动的。具体的生物信息学任务与新颖的计算诱变方法联系如下:(1)使用转导预测蛋白质功能; (2)通过转导和主动学习预测蛋白质功能; (3)通过转导和主动学习结合关联挖掘来预测nsSNP诱发的病理。这些方法的可行性和相对优势在预测HIV-1蛋白酶(HIV-1),噬菌体T4溶菌酶(T4)和Lac Repressor(LAC)蛋白的突变(单个氨基酸多态性)活性上显示。以及在由大量蛋白质组成的nsSNP数据集上预测非同义单核苷酸多态性(nsSNP)引起的病理。使用以下方法解决人口不平衡的问题,即样本数据在属于每个类别的数据集中的比例是不均匀的,它使用以下方法解决:(a)分层抽样,交叉验证在类别分布相同的折痕上进行; (b)随机过采样以提高少数群体的人数,使其规模与多数阶层的人数相等。注释问题是增量转导和主动学习的副产品。在预测性能(任务1、2和3),注释的使用量(训练数据的大小)(任务2和3),预测性能方面,本文提出的新颖方法的性能优于最新方法。并获得了解释(知识)(任务3)。

著录项

  • 作者

    Basit, Nada.;

  • 作者单位

    George Mason University.;

  • 授予单位 George Mason University.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 149 p.
  • 总页数 149
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号