首页> 外文学位 >Predictive Learning with Heterogeneity in Populations
【24h】

Predictive Learning with Heterogeneity in Populations

机译:人口异质性的预测学习

获取原文
获取原文并翻译 | 示例

摘要

Predictive learning forms the backbone of several data-driven systems powering scientific as well as commercial applications, e.g., filtering spam messages, detecting faces in images, forecasting health risks, and mapping ecological resources. However, one of the major challenges in applying standard predictive learning methods in real-world applications is the heterogeneity in populations of data instances, i.e., different groups (or populations) of data instances show different nature of predictive relationships. For example, different populations of human subjects may show different risks for a disease even if they have similar diagnosis reports, depending on their ethnic profiles, medical history, and lifestyle choices. In the presence of population heterogeneity, a central challenge is that the training data comprises of instances belonging from multiple populations, and the instances in the test set may be from a different population than that of the training instances. This limits the effectiveness of standard predictive learning frameworks that are based on the assumption that the instances are independent and identically distributed (i.i.d), which are ideally true only in simplistic settings.;This thesis introduces several ways of learning predictive models with heterogeneity in populations, by incorporating information about the context of every data instance, which is available in varying types and formats in different application settings. It introduces a novel multi-task learning framework for problems where we have access to some ancillary variables that can be grouped to produce homogeneous partitions of data instances, thus addressing the heterogeneity in populations. This thesis also introduces a novel strategy for constructing mode-specific ensembles in binary classification settings, where each class shows multi-modal distribution due to the heterogeneity in their populations. When the context of data instances is implicitly defined such that the test data is known to comprise of contextually similar groups, this thesis presents a novel framework for adapting classification decisions using the group-level properties of test instances. This thesis also builds the foundations of a novel paradigm of scientific discovery, termed as theory-guided data science, that seeks to explore the full potential of data science methods but without ignoring the treasure of knowledge contained in scientific theories and principles.
机译:预测性学习构成了多种支持科学和商业应用的数据驱动系统的基础,例如过滤垃圾邮件,检测图像中的人脸,预测健康风险以及绘制生态资源图。但是,在实际应用中应用标准预测学习方法的主要挑战之一是数据实例群体的异质性,即数据实例的不同组(或群体)表现出不同的预测关系性质。例如,根据不同种族的种族,病史和生活方式的选择,即使不同人群的诊断报告相似,也可能显示出不同的疾病风险。在存在种群异质性的情况下,一个主要挑战是训练数据包括来自多个种群的实例,并且测试集中的实例可能来自与训练实例不同的种群。这限制了基于实例独立且均匀分布(iid)的假设的标准预测学习框架的有效性,理想情况下,只有在简单化的情况下才是正确的。;本文介绍了几种在群体中具有异质性的学习预测模型的方法,通过合并有关每个数据实例的上下文的信息,可以在不同的应用程序设置中以不同的类型和格式使用这些信息。它为问题引入了一种新颖的多任务学习框架,其中我们可以访问一些辅助变量,这些辅助变量可以分组以生成数据实例的均匀分区,从而解决总体的异质性。本文还介绍了一种在二元分类环境中构建特定模式的合奏的新策略,其中每个类别由于其种群的异质性而显示出多模式分布。当隐式定义数据实例的上下文以使测试数据已知包含上下文相似的组时,本论文提出了一种新颖的框架,用于使用测试实例的组级别属性来适应分类决策。本论文还为被称为理论指导的数据科学的新型科学发现范式奠定了基础,该范式旨在探索数据科学方法的全部潜能,但又不忽略科学理论和原理所包含的知识宝藏。

著录项

  • 作者

    Karpatne, Anuj.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer science.;Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 133 p.
  • 总页数 133
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号