首页> 外文学位 >Clustering with Flexible Constraints and Application to Disease Subtyping

【24h】

Clustering with Flexible Constraints and Application to Disease Subtyping

机译：具有弹性约束的聚类及其在疾病分型中的应用

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering algorithms are widely used to extract knowledge from large amount of unlabeled data (such as, discovering subtypes of complex diseases to enable personalized treatments of patients). Clustering is a challenging problem because given the same data, samples can be grouped in multiple different perspectives (views). Which of these alternative groupings is useful depends on the application. Thus, incorporating domain expert input often improves clustering performance. In this dissertation, we explore various ways to incorporate expert input to guide clustering. First, domain experts often have an idea regarding properties that clustering solutions should have in order to be useful based on domain relevant scores. We propose a framework to jointly optimize the usefulness and quality of a clustering solution. Second, besides instance-level constraints, feature-level structures can also be utilized to improve clustering. We consider two types of feature-level structures: 1) decision rules on a small set of features to provide interpretable clusterings; and 2) a feature similarity matrix used to guide the embeddings for clustering. Third, instead of supervision from one expert, it is becoming more common for supervision to be available from multiple experts as data can be shared and processed by increasingly larger audiences. To address this new clustering paradigm, we make the following contributions: 1) Because experts are not oracles, their inputs are prone to errors as well. We build a probabilistic model to learn the shared latent clustering structure in the data by explicitly modeling the accuracy of each expert. 2) Since different experts might provide supervision with varying views in mind, we build a Bayesian probabilistic model for learning multiple latent clustering views from multiple experts. Besides demonstrating the superior performance of our proposed approaches on synthetic and benchmark data sets, we also applied them to discover subtypes of a complex lung disease, called chronic obstructive pulmonary disease (COPD), and obtained clinically meaningful results.

机译：聚类算法被广泛用于从大量未标记的数据中提取知识（例如，发现复杂疾病的亚型以实现患者的个性化治疗）。聚类是一个具有挑战性的问题，因为在给定相同数据的情况下，可以将样本分为多个不同的视角（视图）。这些替代分组中的哪一个有用取决于应用程序。因此，合并领域专家的输入通常可以提高群集性能。在本文中，我们探索了多种方法来结合专家意见来指导聚类。首先，领域专家通常会对聚类解决方案应具有的属性有所了解，以便基于领域相关分数来发挥作用。我们提出了一个框架，以共同优化集群解决方案的实用性和质量。其次，除了实例级别的约束之外，特征级别的结构也可以用于改善聚类。我们考虑两种类型的特征级别结构：1）一小套特征的决策规则，以提供可解释的聚类； 2）特征相似度矩阵，用于指导嵌入进行聚类。第三，由于可以由越来越多的受众共享和处理数据，因此可以由多位专家提供监督不再是由一位专家进行监督。为了解决这个新的聚类范例，我们做出了以下贡献：1）由于专家不是预言家，他们的输入也容易出错。我们通过显式地建模每个专家的准确性，建立一个概率模型来学习数据中的共享潜在聚类结构。 2）由于不同的专家可能会以不同的观点提供监管，因此我们建立了贝叶斯概率模型，用于从多位专家那里学习多个潜在聚类视图。除了在合成和基准数据集上展示我们提出的方法的优越性能外，我们还将它们应用于发现复杂的肺部疾病亚型（称为慢性阻塞性肺病（COPD））并获得临床上有意义的结果。

著录项

作者
Chang, Yale.;
展开▼
作者单位

Northeastern University.;

展开▼
授予单位 Northeastern University.;
学科 Artificial intelligence.
学位 Ph.D.
年度 2017
页码 132 p.
总页数 132
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:54:24

相似文献

外文文献
中文文献
专利

1. An application of flexible constraints in Monte Carlo simulations of the isobaric-isothermal ensemble of liquid water and ice Ih with the polarizable and flexible mobile charge densities in harmonic oscillators model [J] . Humberto Saint-Martin, Berk Hess, Herman J.C.Berendsen The Journal of Chemical Physics . 2004,第23期

机译：柔性约束在谐波振荡器模型中具有极化和柔性移动电荷密度的液态水和冰Ih等压等温集合的蒙特卡罗模拟中的应用
2. Flexible constraints: an adiabatic treatment of quantum degrees of freedom, with application on the flexible and polarizable mobile charge densities in harmonic oscillators model for water [J] . Berk Hess, Humberto Saint-Martin, Herman J.C.Berendsen The Journal of Chemical Physics . 2002,第22期

机译：灵活的约束条件：对量子自由度的绝热处理，并应用于水的谐振子模型中灵活且可极化的移动电荷密度
3. ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification [J] . Advances in data analysis and classification . 2020,第2期

机译：polareLemdi：粒子蒙特卡罗对癌症亚型鉴定的多个数据集的聚类分析方法
4. MICRORNA-AUGMENTED PATHWAYS (mirAP) AND THEIR APPLICATIONS TO PATHWAY ANALYSIS AND DISEASE SUBTYPING [C] . DIANA DIAZ, MICHELE DONATO, TIN NGUYEN, Pacific Symposium on Biocomputing . 2017

机译：MicroRNA-Augmented Pathways（Mirap）及其对途径分析和疾病亚型的应用
5. Efficient biclustering and its applications amid privacy constraints. [D] . Ahmad, Waseem. 2007

机译：在隐私受限的情况下，有效的双集群技术及其应用。
6. A Novel Artificial Immune Algorithm for Spatial Clustering with Obstacle Constraint and Its Applications [O] . Liping Sun, Yonglong Luo, Xintao Ding, 2014

机译：具有障碍约束的空间聚类的新型人工免疫算法及其应用
7. Simplification of Subtyping Constraints and Its Application for Monadic Programming(Type Theory and its Applications to Computer Systems) [O] . KAGAWA Koji 1998

机译：子类型约束的简化及其在单子编程中的应用（类型理论及其在计算机系统中的应用）

Clustering with Flexible Constraints and Application to Disease Subtyping

摘要

著录项

相似文献

相关主题

期刊订阅