首页> 外文学位 >A theory of multitask learning for learning from disparate data sources.
【24h】

A theory of multitask learning for learning from disparate data sources.

机译:从不同数据源中学习的多任务学习理论。

获取原文
获取原文并翻译 | 示例

摘要

Many endeavors require the integration of data from multiple data sources. One major obstacle to such undertakings is the fact that different sources may vary considerably in the way they choose to represent their data, even if their data collections are otherwise perfectly compatible. In practice, this problem is usually solved by a manual construction of translations between these data representations, although there have been some recent attempts at supplementing this with automated algorithms based on machine learning methods.; This work addresses the problem of making classification predictions based on data from multiple sources, without constructing explicit translations between them. We view this problem as a special case of the problem of multitask learning problem: both intuition and much empirical work indicate that learning can be improved by attacking multiple related tasks simultaneously. However, thus far, no theoretical work has been able to support this claim, and no concrete definition has been proposed for what it means for two learning tasks to be “related.”; In this work, we introduce a general notion of relatedness between tasks, provide the standard sort of information complexity bound for such tasks, and give general conditions under which this bound is an improvement over standard single task learning results.; Finally, we apply these results to the problem of learning from disparate data sources. We give a decision tree learning algorithm for this problem for a particular type of data source disparity and demonstrate its empirical success on real data sets.
机译:许多努力要求集成来自多个数据源的数据。进行此类工作的一个主要障碍是,即使数据收集在其他方面完全兼容,不同来源在表示数据的方式上也会有很大差异。在实践中,这个问题通常通过手动构建这些数据表示之间的翻译来解决,尽管最近有一些尝试以基于机器学习方法的自动算法来补充它。这项工作解决了基于多个来源的数据进行分类预测的问题,而无需在它们之间构造显式转换。我们将此问题视为多任务学习问题的特例:直觉和大量的经验工作都表明,可以通过同时攻击多个相关任务来改善学习。但是,到目前为止,还没有理论上的工作能够支持这一主张,也没有提出具体定义来定义两个学习任务“相关”的含义。在这项工作中,我们引入了任务之间相关性的一般概念,提供了针对此类任务的标准信息复杂度范围,并给出了在这种条件下相对于标准单任务学习结果的改进的一般条件。最后,我们将这些结果应用于从不同数据源中学习的问题。针对特定类型的数据源差异,我们针对该问题给出了决策树学习算法,并证明了其在实际数据集上的经验成功。

著录项

  • 作者

    Schuller, Rebecca Ann.;

  • 作者单位

    Cornell University.;

  • 授予单位 Cornell University.;
  • 学科 Computer Science.; Mathematics.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 p.4467
  • 总页数 106
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号