首页> 外文OA文献 >New kernel functions and learning methods for text and data mining
【2h】

New kernel functions and learning methods for text and data mining

机译:用于文本和数据挖掘的新内核功能和学习方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented.In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display.We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
机译:机器学习方法的最新进展使得越来越多的自动构建各种类型的计算机辅助方法成为可能,这些方法在人类专家的编程下很难或很费力。在许多领域,尤其是在生物信息学和自然语言处理领域,都需要使用这种工具。如果机器学习方法不适用于所讨论的任务,则可能无法令人满意地工作。但是,通常可以通过对应用程序领域或即将出现的学习问题有更深入的了解来提高他们的学习性能。本文考虑以有利的方式开发基于内核的学习算法,该算法结合了所讨论任务的这种先验知识。此外,提出了用于训练学习机完成特定任务的高效计算算法。在基于内核的学习方法的上下文中,先验知识的合并通常是通过设计适当的内核功能来完成的。另一种众所周知的方法是开发适合所考虑任务的成本函数。对于自然语言中的消歧任务,我们开发了内核功能,该功能考虑了位置信息和单词的相互相似性。结果表明,使用此信息可以显着提高学习机的消歧性能。此外,我们设计了一种新的成本函数,该函数比用于回归和分类的成本函数更适合于信息检索任务和更一般的排名问题。我们还考虑了基于内核的学习算法的其他应用,例如文本分类和差异显示中的模式识别。我们开发了计算有效的算法,以利用所提出的内核功能训练考虑​​的学习机。我们还针对正则化最小二乘型学习算法设计了一种快速的交叉验证算法。此外,提出了可以与新的成本函数一起用于偏好学习和排名任务的正则化最小二乘算法的有效版本。总而言之,我们证明了合并先验知识是可能且有益的,并且新颖的高级内核和成本函数可以有效地用于算法中。

著录项

  • 作者

    Pahikkala Tapio;

  • 作者单位
  • 年度 2008
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号