首页> 外文学位 >Clustering with Flexible Constraints and Application to Disease Subtyping
【24h】

Clustering with Flexible Constraints and Application to Disease Subtyping

机译:具有弹性约束的聚类及其在疾病分型中的应用

获取原文
获取原文并翻译 | 示例

摘要

Clustering algorithms are widely used to extract knowledge from large amount of unlabeled data (such as, discovering subtypes of complex diseases to enable personalized treatments of patients). Clustering is a challenging problem because given the same data, samples can be grouped in multiple different perspectives (views). Which of these alternative groupings is useful depends on the application. Thus, incorporating domain expert input often improves clustering performance. In this dissertation, we explore various ways to incorporate expert input to guide clustering. First, domain experts often have an idea regarding properties that clustering solutions should have in order to be useful based on domain relevant scores. We propose a framework to jointly optimize the usefulness and quality of a clustering solution. Second, besides instance-level constraints, feature-level structures can also be utilized to improve clustering. We consider two types of feature-level structures: 1) decision rules on a small set of features to provide interpretable clusterings; and 2) a feature similarity matrix used to guide the embeddings for clustering. Third, instead of supervision from one expert, it is becoming more common for supervision to be available from multiple experts as data can be shared and processed by increasingly larger audiences. To address this new clustering paradigm, we make the following contributions: 1) Because experts are not oracles, their inputs are prone to errors as well. We build a probabilistic model to learn the shared latent clustering structure in the data by explicitly modeling the accuracy of each expert. 2) Since different experts might provide supervision with varying views in mind, we build a Bayesian probabilistic model for learning multiple latent clustering views from multiple experts. Besides demonstrating the superior performance of our proposed approaches on synthetic and benchmark data sets, we also applied them to discover subtypes of a complex lung disease, called chronic obstructive pulmonary disease (COPD), and obtained clinically meaningful results.
机译:聚类算法被广泛用于从大量未标记的数据中提取知识(例如,发现复杂疾病的亚型以实现患者的个性化治疗)。聚类是一个具有挑战性的问题,因为在给定相同数据的情况下,可以将样本分为多个不同的视角(视图)。这些替代分组中的哪一个有用取决于应用程序。因此,合并领域专家的输入通常可以提高群集性能。在本文中,我们探索了多种方法来结合专家意见来指导聚类。首先,领域专家通常会对聚类解决方案应具有的属性有所了解,以便基于领域相关分数来发挥作用。我们提出了一个框架,以共同优化集群解决方案的实用性和质量。其次,除了实例级别的约束之外,特征级别的结构也可以用于改善聚类。我们考虑两种类型的特征级别结构:1)一小套特征的决策规则,以提供可解释的聚类; 2)特征相似度矩阵,用于指导嵌入进行聚类。第三,由于可以由越来越多的受众共享和处理数据,因此可以由多位专家提供监督不再是由一位专家进行监督。为了解决这个新的聚类范例,我们做出了以下贡献:1)由于专家不是预言家,他们的输入也容易出错。我们通过显式地建模每个专家的准确性,建立一个概率模型来学习数据中的共享潜在聚类结构。 2)由于不同的专家可能会以不同的观点提供监管,因此我们建立了贝叶斯概率模型,用于从多位专家那里学习多个潜在聚类视图。除了在合成和基准数据集上展示我们提出的方法的优越性能外,我们还将它们应用于发现复杂的肺部疾病亚型(称为慢性阻塞性肺病(COPD))并获得临床上有意义的结果。

著录项

  • 作者

    Chang, Yale.;

  • 作者单位

    Northeastern University.;

  • 授予单位 Northeastern University.;
  • 学科 Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 132 p.
  • 总页数 132
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:54:24

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号