首页> 外文学位 >Methods for Statistical Association Mining by Variable-to-Set Affinity Testing

【24h】

Methods for Statistical Association Mining by Variable-to-Set Affinity Testing

机译：通过变量对集合的亲和力测试进行统计关联挖掘的方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Statistical data mining refers to methods for identifying and validating interesting patterns from an overabundance of data. Data mining tasks in which the objective involves pairwise relationships between variables are known as association mining. In general, features sought by association mining methods are sets of variables, often small subsets of a larger collection, that are more associated internally than externally. Methods vary in both the measure of association that is studied and the algorithm by which associated sets are identified. This dissertation discusses provide a generalized framework for association mining called Variable-to-Set Affinity Testing (VSAT). Unlike conventional techniques for clustering or community detection, which usually maximize a score from a dissimilarity or adjacency matrix, the VSAT approach is an adaptive procedure grounded in statistical hypothesis testing principles. The framework is adaptable to a broad class of measurements for variable relationships, and is equipped with theoretical guarantees of error control.;This dissertation also presents in detail two new association mining methods built in the VSAT framework. The first, Differential Correlation Mining (DCM), identifies variable sets that have higher average pairwise correlation in one sample condition than in another. Such artifacts are of scientific interest in many fields, including statistical genetics and neuroscience. Differential Correlation Mining is applied to high-dimensional data sets in these two fields. The second method, Coherent Set Mining (CSM), is a novel approach to association mining in binary data. Dichotomous observations are assumed to derive from a latent variable of interest via thresholding. The Coherent Set Mining method identifies variable sets that are strongly associated in the latent measure, despite distortions in the association structure of the observed data due to the thresholding process. Coherent Set Mining is applied to problems in text mining, statistical genetics, and product recommendation.

机译：统计数据挖掘是指从数据过多中识别和验证有趣模式的方法。目标涉及变量之间成对关系的数据挖掘任务称为关联挖掘。通常，关联挖掘方法寻求的特征是变量集，通常是较大集合的小子集，它们在内部比在外部具有更大的关联性。方法在研究的关联度量和识别关联集的算法上都不同。本文讨论提供了一种通用的关联挖掘框架，称为变量集相似性测试（VSAT）。 VSAT方法不同于传统的聚类或社区检测技术，该技术通常使差异或邻接矩阵的得分最大化，而VSAT方法是一种基于统计假设检验原理的自适应程序。该框架适用于广泛的变量关系度量，并提供了错误控制的理论保证。;本文还详细介绍了在VSAT框架中构建的两种新的关联挖掘方法。第一个是差分相关挖掘（DCM），它标识在一个样本条件下比在另一样本条件下具有更高平均成对相关性的变量集。在许多领域，包括统计遗传学和神经科学，此类文物具有科学意义。差分相关挖掘应用于这两个字段中的高维数据集。第二种方法是相干集合挖掘（CSM），是一种在二进制数据中进行关联挖掘的新颖方法。假设二分法观测值是通过阈值从感兴趣的潜在变量中得出的。相干集挖掘方法可识别在潜在度量中紧密关联的变量集，尽管由于阈值处理而导致观测数据的关联结构发生了扭曲。相干集合挖掘适用于文本挖掘，统计遗传学和产品推荐中的问题。

著录项

作者
Bodwin, Kelly Nicole.;
展开▼
作者单位

The University of North Carolina at Chapel Hill.;

展开▼
授予单位 The University of North Carolina at Chapel Hill.;
学科 Statistics.
学位 Ph.D.
年度 2017
页码 126 p.
总页数 126
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Statistical Methods for Association Tests of Multiple Continuous Traits in Genome-Wide Association Studies [J] . Wu Baolin, Pankow James S. Annals of Human Genetics . 2015,第4期

机译：全基因组关联研究中多个连续性状关联测试的统计方法
2. Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis. [J] . Dinu V, Zhao H, Miller PL Journal of biomedical informatics. . 2007,第6期

机译：将领域知识与统计和数据挖掘方法相结合，以进行高密度基因组SNP疾病关联分析。
3. Statistical methods for testing X chromosome variant associations: application to sex-specific characteristics of bipolar disorder [J] . William A. Jons, Colin L. Colby, Susan L. McElroy, Biology of Sex Differences . 2019,第1期

机译：测试X染色体变异关联的统计方法：施用双相障碍的性别特异性特征
4. Mining strong affinity association patterns in data sets with skewed support distribution [C] . Xiong, H., Tan, . 2003

机译：挖掘具有偏斜支持分布的数据集中的强亲和力关联模式
5. Statistical tools for general association testing and control of false discoveries in group testing [D] . Rudra, Pratyaydipta. 2015

机译：用于一般关联测试和控制组测试中的错误发现的统计工具
6. Reconsidering Association Testing Methods Using Single-Variant Test Statistics as Alternatives to Pooling Tests for Sequence Data with Rare Variants [O] . Daniel D. Kinnamon, Ray E. Hershberger, Eden R. Martin -1

机译：反思协会测试方法：使用单变检验统计量作为替代池试验序列信息罕见变异
7. Reconsidering Association Testing Methods Using Single-Variant Test Statistics as Alternatives to Pooling Tests for Sequence Data with Rare Variants [O] . Kinnamon, Daniel D., Hershberger, Ray E., Martin, Eden R. 2012

机译：重新考虑关联测试方法，使用单变量测试统计数据替代具有稀有变体的序列数据的合并测试

Methods for Statistical Association Mining by Variable-to-Set Affinity Testing

摘要

著录项

相似文献

相关主题

期刊订阅