首页> 外文OA文献 >Distance-based methods for detecting associations in structured data with applications in bioinformatics
【2h】

Distance-based methods for detecting associations in structured data with applications in bioinformatics

机译:基于距离的方法,用于检测结构化数据中的关联以及生物信息学中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In bioinformatics applications samples of biological variables of interest can take a varietyudof structures. For instance, in this thesis we consider vector-valued observationsudof multiple gene expression and genetic markers, curve-valued gene expression timeudcourses, and graph-valued functional connectivity networks within the brain. Thisudthesis considers three problems routinely encountered when dealing with such variables:uddetecting differences between populations, detecting predictive relationshipsudbetween variables, and detecting association between variables.udDistance-based approaches to these problems are considered, offering great flexibilityudover alternative approaches, such as traditional multivariate approaches whichudmay be inappropriate. The notion of distance has been widely adopted in recent yearsudto quantify the dissimilarity between samples, and suitable distance measures can beudapplied depending on the nature of the data and on the specific objectives of the study.udFor instance, for gene expression time courses modeled as time-dependent curves, distanceudmeasures can be specified to capture biologically meaningful aspects of theseudcurves which may differ. On obtaining a distance matrix containing all pairwise distancesudbetween the samples of a given variable, many distance-based testing proceduresudcan then be applied. The main inhibitor of their effective use in bioinformatics is thatudp-values are typically estimated by using Monte Carlo permutations. Thousands orudeven millions of tests need to be performed simultaneously, and time/computationaludconstraints lead to a low number of permutations being enumerated for each test.udThe contributions of this thesis include the proposal of two new distance-basedudstatistics, the DBF statistic for the problem of detecting differences between populations,udand the GRV coefficient for the problem of detecting association betweenudvariables. In each case approximate null distributions are derived, allowing estimationudof p-values with reduced computational cost, and through simulation these are shown to work well for a range of distances and data types. The tests are also demonstratedudto be competitive with existing approaches. For the problem of detecting predictiveudrelationships between variables, the approximate null distribution is derived for theudroutinely used distance-based pseudo F test, and through simulation this is shown toudwork well for a range of distances and data types. All tests are applied to real datasets,udincluding a longitudinal human immune cell M. tuberculosis dataset, an Alzheimer’suddisease dataset, and an ovarian cancer dataset.
机译:在生物信息学应用中,感兴趣的生物变量样本可以采用多种 udof结构。例如,在本文中,我们考虑了矢量值观察多个基因表达和遗传标记的ud,曲线值基因表达时间课程以及图值在大脑中的功能连接网络。这 udthesis认为这样的变量打交道时经常遇到的三个问题: uddetecting人群之间的差异,检测预测关系 udbetween变量,变量之间的探测关联 udDistance基于被视为办法解决这些问题,提供了极大的灵活性 udover替代方法,例如传统的多变量方法,可能不合适。近年来,距离的概念已被广泛采用,以量化样本之间的差异,并且可以根据数据的性质和研究的特定目标来应用适当的距离度量。 ud例如,用于基因表达可以将时间过程建模为与时间相关的曲线,可以指定距离测度以捕获这些曲线的生物学意义。在获得包含给定变量的样本之间的所有成对距离的距离矩阵之后,可以应用许多基于距离的测试过程。在生物信息学中有效使用它们的主要障碍是, udp值通常通过使用蒙特卡洛排列来估算。需要同时执行成千上万个测试,而时间/计算 ud约束导致每个测试枚举的排列次数较少。 ud本论文的贡献包括两个新的基于距离的 udstatistics的建议, DBF统计量用于检测种群之间的差异问题,ud和GRV系数用于检测种群之间的关联问题。在每种情况下,都可以得出近似的零分布,从而可以以降低的计算成本估算p值,并且通过仿真显示,它们对于一定范围的距离和数据类型均适用。测试也被证明与现有方法竞争。对于检测变量之间的预测/非相关性的问题,对于基于常规使用的基于距离的伪F检验,得出了近似零分布,并且通过仿真显示,对于一定范围的距离和数据类型,该方法非常有效。所有测试均应用于真实数据集,包括纵向人类​​免疫细胞结核分枝杆菌数据集,阿尔茨海默氏病/卵巢疾病数据集和卵巢癌数据集。

著录项

  • 作者

    Minas Christopher;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号