首页> 外文学位 >Statistical design and analysis of high throughput screening data using pooling experiments and data mining techniques.
【24h】

Statistical design and analysis of high throughput screening data using pooling experiments and data mining techniques.

机译:使用合并实验和数据挖掘技术对高通量筛选数据进行统计设计和分析。

获取原文
获取原文并翻译 | 示例

摘要

Discovery of a new drug involves screening large chemical libraries to identify new and diverse active compounds. Only a very small percentage of the compounds in the library are active. Naive screening approaches of testing all compounds in the library are not desirable since in addition to being expensive, they provide little information on what aspects of the chemical structure of active compounds are related to activity.; This work investigates pooling experiments as one possible approach of improving screening efficiency and gaining insight into the structure-activity relationships. Four different pooling designs are proposed using two design criteria, optimal coverage of the chemical space and minimal collision between compounds. We evaluate each method by determining how well the design criteria are met and whether the methods are able to find many diverse active compounds. One pooling design emerges as a winner, but all designed pools clearly outperform randomly created pools. Furthermore, different analysis approaches of the pooling designs are investigated. Multiple trees are compared to model-based likelihood approaches with different covariate class definitions. Results show that a model-based likelihood approach with a multiple-trees-lower-bound covariate class definition gives the best performance. Another possible approach of improving screening efficiency and gaining insight into the structure-activity relationships is the use of data mining techniques such as RandomForest and ChemTree. These techniques are applied to individual compounds.
机译:新药的发现涉及筛选大型化学文库以鉴定新的和多样的活性化合物。文库中只有很小百分比的化合物具有活性。测试库中所有化合物的幼稚筛选方法是不理想的,因为除了昂贵之外,它们几乎没有提供有关活性化合物化学结构的哪些方面与活性有关的信息。这项工作调查合并实验,作为提高筛选效率和深入了解结构-活性关系的一种可能方法。使用两个设计标准提出了四种不同的合并设计,即化学空间的最佳覆盖范围和化合物之间的最小碰撞。我们通过确定满足设计标准的程度以及这些方法是否能够找到许多不同的活性化合物来评估每种方法。一种池设计成为赢家,但是所有设计的池明显优于随机创建的池。此外,研究了池设计的不同分析方法。将多棵树与具有不同协变量类定义的基于模型的似然方法进行比较。结果表明,具有多树下界协变量类定义的基于模型的似然方法可提供最佳性能。提高筛选效率并深入了解构效关系的另一种可能方法是使用数据挖掘技术,例如RandomForest和ChemTree。这些技术适用于单个化合物。

著录项

  • 作者

    Remlinger, Katja S.;

  • 作者单位

    North Carolina State University.;

  • 授予单位 North Carolina State University.;
  • 学科 Statistics.; Biology Biostatistics.; Health Sciences Pharmacology.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 241 p.
  • 总页数 241
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;生物数学方法;药理学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号