Geometric methods in machine learning and data mining.

机译：机器学习和数据挖掘中的几何方法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In machine learning, the standard goal of is to find an appropriate statistical model from a model space based on the training data from a data space; while in data mining, the goal is to find interesting patterns in the data from a data space. In both fields, these spaces carry geometric structures that can be exploited using methods that make use of these geometric structures (we shall call them geometric methods), or the problems themselves can be formulated in a way that naturally appeal to these methods. In such cases, studying these geometric structures and then using appropriate geometric methods not only gives insight into existing algorithms, but also helps build new and better algorithms. In my research, I develop methods that exploit geometric structure of problems for a variety of machine learning and data mining problems, and provide strong theoretical and empirical evidence in favor of using them.;My dissertation is divided into two parts. In the first part, I develop algorithms to solve a well known problem in data mining i.e. distance embedding problem. In particular, I use tools from computational geometry to build a unified framework for solving a distance embedding problem known as multidimensional scaling (MDS). This geometry-inspired framework results in algorithms that can solve different variants of MDS better than previous state-of-the-art methods. In addition, these algorithms come with many other attractive properties: they are simple, intuitive, easily parallelizable, scalable, and can handle missing data. Furthermore, I extend my unified MDS framework to build scalable algorithms for dimensionality reduction, and also to solve a sensor network localization problem for mobile sensors. Experimental results show the effectiveness of this framework across all problems.;In the second part of my dissertation, I turn to problems in machine learning, in particular, use geometry to reason about conjugate priors, develop a model that hybridizes between discriminative and generative frameworks, and build a new set of generative-process-driven kernels. More specifically, this part of my dissertation is devoted to the study of the geometry of the space of probabilistic models associated with statistical generative processes. This study---based on the theory well grounded in information geometry---allows me to reason about the appropriateness of conjugate priors from a geometric perspective, and hence gain insight into the large number of existing models that rely on these priors. Furthermore, I use this study to build hybrid models more naturally i.e., by combining discriminative and generative methods using the geometry underlying them, and also to build a family of kernels called generative kernels that can be used as off-the-shelf tool in any kernel learning method such as support vector machines. My experiments of generative kernels demonstrate their effectiveness providing further evidence in favor of using geometric methods.

机译：在机器学习中，标准的目标是根据来自数据空间的训练数据从模型空间中找到合适的统计模型；而在数据挖掘中，目标是从数据空间中找到有趣的模式。在这两个领域中，这些空间都带有几何结构，可以使用利用这些几何结构的方法（我们将它们称为几何方法）加以利用，或者可以用自然吸引这些方法的方式来表达问题本身。在这种情况下，研究这些几何结构，然后使用适当的几何方法，不仅可以洞悉现有算法，还可以帮助构建新的更好的算法。在我的研究中，我开发了利用问题的几何结构解决各种机器学习和数据挖掘问题的方法，并为使用它们提供了有力的理论和经验证据。;本文分为两部分。在第一部分中，我开发了算法来解决数据挖掘中的一个众所周知的问题，即距离嵌入问题。特别是，我使用计算几何学中的工具来构建统一的框架，以解决称为多维缩放（MDS）的距离嵌入问题。与以前的最新技术方法相比，这种受几何学启发的框架产生的算法可以更好地解决MDS的不同变体。此外，这些算法还具有许多其他吸引人的特性：它们简单，直观，易于并行化，可扩展，并且可以处理丢失的数据。此外，我扩展了我的统一MDS框架，以构建用于降维的可伸缩算法，还解决了移动传感器的传感器网络本地化问题。实验结果证明了该框架在所有问题上的有效性。；在论文的第二部分，我着重探讨了机器学习中的问题，特别是使用几何推理共轭先验，建立了区分框架和生成框架的混合模型。，并构建一组新的生成过程驱动的内核。更具体地说，本文的这一部分致力于研究与统计生成过程相关的概率模型空间的几何形状。这项基于信息几何学基础的理论的研究使我能够从几何学角度对共轭先验的适当性进行推理，从而深入了解依赖这些先验的大量现有模型。此外，我使用这项研究来更自然地构建混合模型，即通过使用基于其的几何结构的判别方法和生成方法进行组合，并且还建立了一个称为生成内核的内核家族，可以在任何工具中将其用作现成的工具。支持向量机等内核学习方法。我的生成核实验证明了其有效性，为使用几何方法提供了进一步的证据。

著录项

作者
Agarwal, Arvind.;
展开▼
作者单位

University of Maryland, College Park.;

展开▼
授予单位 University of Maryland, College Park.;
学科 Computer Science.
学位 Ph.D.
年度 2012
页码 229 p.
总页数 229
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A comparative study of machine learning algorithms applied to predictive toxicology data mining. [J] . Neagu DC, Guo G, Trundle PR, Alternatives to laboratory animals: ATLA . 2007,第1期

机译：机器学习算法在预测毒理学数据挖掘中的比较研究。
2. A predictive failure framework for brittle porous materials via machine learning and geometric matching methods [J] . Karakoc Alp, Keles Ozgur Journal of Materials Science . 2020,第11期

机译：通过机器学习和几何匹配方法对脆性多孔材料进行预测失效框架
3. Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies [J] . Macari Gabriele, Toti Daniele, Polticelli Fabio Journal of Computer-Aided Molecular Design . 2019,第10期

机译：蛋白质和小分子之间结合位点识别的计算方法和工具：从经典的几何方法到现代机器学习策略
4. Architecture and Building Enginnering Educational Data Mining. Learning Analytics for detecting academic dropout [C] . David Simon, David Fonseca, Silvia Necchi, Iberian Conference on Information Systems and Technologies . 2019

机译：建筑工程教育数据挖掘。学习分析以检测学术辍学
5. Local prediction and classification techniques for machine learning and data mining. [D] . Lanker, Cory L. 2015

机译：用于机器学习和数据挖掘的本地预测和分类技术。
6. 2D geometric shapes dataset – for machine learning and pattern recognition [O] . Anas El Korchi, Youssef Ghanou 2020

机译：2D几何形状数据集 - 用于机器学习和模式识别
7. Machine Learning, Data Mining. [O] . Jalal Mahmud 2006

机译：机器学习，数据挖掘。

Geometric methods in machine learning and data mining.

摘要

著录项

相似文献

相关主题

期刊订阅