首页> 外文OA文献 >Semi-supervised machine learning techniques for classification of evolving data in pattern recognition
【2h】

Semi-supervised machine learning techniques for classification of evolving data in pattern recognition

机译:用于模式识别中演化数据分类的半监督机器学习技术

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

The amount of data recorded and processed over recent years has increased exponentially. To create intelligent systems that can learn from this data, we need to be able to identify patterns hidden in the data itself, learn these pattern and predict future results based on our current observations. If we think about this system in the context of time, the data itself evolves and so does the nature of the classification problem. As more data become available, different classification algorithms are suitable for a particular setting. At the beginning of the learning cycle when we have a limited amount of data, online learning algorithms are more suitable. When truly large amounts of data become available, we need algorithms that can handle large amounts of data that might be only partially labeled as a result of the bottleneck in the learning pipeline from human labeling of the data.ududAn excellent example of evolving data is gesture recognition, and it is present throughout our work. We need a gesture recognition system to work fast and with very few examples at the beginning. Over time, we are able to collect more data and the system can improve. As the system evolves, the user expects it to work better and not to have to become involved when the classifier is unsure about decisions. This latter situation produces additional unlabeled data. Another example of an application is medical classification, where experts’ time is a rare resource and the amount of received and labeled data disproportionately increases over time.ududAlthough the process of data evolution is continuous, we identify three main discrete areas of contribution in different scenarios. When the system is very new and not enough data are available, online learning is used to learn after every single example and to capture the knowledge very fast. With increasing amounts of data, offline learning techniques are applicable. Once the amount of data is overwhelming and the teacher cannot provide labels for all the data, we have another setup that combines labeled and unlabeled data. These three setups define our areas of contribution; and our techniques contribute in each of them with applications to pattern recognition scenarios, such as gesture recognition and sketch recognition.ududAn online learning setup significantly restricts the range of techniques that can be used. In our case, the selected baseline technique is the Evolving TS-Fuzzy Model. The semi-supervised aspect we use is a relation between rules created by this model. Specifically, we propose a transductive similarity model that utilizes the relationship between generated rules based on their decisions about a query sample during the inference time. The activation of each of these rules is adjusted according to the transductive similarity, and the new decision is obtained using the adjusted activation. We also propose several new variations to the transductive similarity itself.ududOnce the amount of data increases, we are not limited to the online learning setup, and we can take advantage of the offline learning scenario, which normally performs better than the online one because of the independence of sample ordering and global optimization with respect to all samples. We use generative methods to obtain data outside of the training set. Specifically, we aim to improve the previously mentioned TS Fuzzy Model by incorporating semi-supervised learning in the offline learning setup without unlabeled data. We use the Universum learning approach and have developed a method called UFuzzy. This method relies on artificially generated examples with high uncertainty (Universum set), and it adjusts the cost function of the algorithm to force the decision boundary to be close to the Universum data. We were able to prove the hypothesis behind the design of the UFuzzy classifier that Universum learning can improve the TS Fuzzy Model and have achieved improved performance on more than two dozen datasets and applications.ududWith increasing amounts of data, we use the last scenario, in which the data comprises both labeled data and additional non-labeled data. This setting is one of the most common ones for semi-supervised learning problems. In this part of our work, we aim to improve the widely popular tecjniques of self-training (and its successor help-training) that are both meta-frameworks over regular classifier methods but require probabilistic representation of output, which can be hard to obtain in the case of discriminative classifiers. Therefore, we develop a new algorithm that uses the modified active learning technique Query-by-Committee (QbC) to sample data with high certainty from the unlabeled set and subsequently embed them into the original training set. Our new method allows us to achieve increased performance over both a range of datasets and a range of classifiers.udududThese three works are connected by gradually relaxing the constraints on the learning setting in which we operate. Although our main motivation behind the development was to increase performance in various real-world tasks (gesture recognition, sketch recognition), we formulated our work as general methods in such a way that they can be used outside a specific application setup, the only restriction being that the underlying data evolve over time. Each of these methods can successfully exist on its own. The best setting in which they can be used is a learning problem where the data evolve over time and it is possible to discretize the evolutionary process.ududOverall, this work represents a significant contribution to the area of both semi-supervised learning and pattern recognition. It presents new state-of-the-art techniques that overperform baseline solutions, and it opens up new possibilities for future research.
机译:近年来记录和处理的数据量呈指数增长。为了创建可以从这些数据中学习的智能系统,我们需要能够识别隐藏在数据本身中的模式,学习这些模式并根据我们当前的观察结果预测未来的结果。如果我们在时间的背景下考虑这个系统,那么数据本身会演化,分类问题的本质也会演化。随着更多数据的可用,不同的分类算法适用于特定设置。在学习周期开始时,当我们的数据量有限时,在线学习算法会更合适。当真正大量的数据可用时,我们需要能够处理大量数据的算法,这些数据可能由于人为标记数据而导致的学习管道中的瓶颈而仅被部分标记。 ud ud演变的一个很好的例子数据是手势识别,它存在于我们的整个工作中。我们需要一个手势识别系统来快速工作,并且开始时只有很少的示例。随着时间的流逝,我们能够收集更多数据,并且系统会不断完善。随着系统的发展,用户希望系统工作得更好,并且当分类器不确定决策时不必参与其中。后一种情况会产生其他未标记的数据。应用的另一个示例是医学分类,其中专家的时间是一种稀有资源,并且接收和标记的数据量随时间成比例增加。 ud ud尽管数据演化过程是连续的,但我们确定了三个主要的离散领域在不同的情况下。当系统非常新并且没有足够的数据可用时,将使用在线学习来学习每个示例,然后非常快地捕获知识。随着数据量的增加,可应用离线学习技术。一旦数据量不堪重负,并且老师无法为所有数据提供标签,我们就会有另一种设置,将带标签和未带标签的数据进行组合。这三个设置定义了我们的贡献领域。并且我们的技术通过将其应用于模式识别方案(例如手势识别和草图识别)来做出贡献。 ud ud在线学习设置极大地限制了可以使用的技术范围。在我们的案例中,选择的基线技术是演进的TS模糊模型。我们使用的半监督方面是此模型创建的规则之间的关系。具体而言,我们提出了一种转导相似性模型,该模型利用基于推理规则在查询时间内对查询样本的决策所生成的规则之间的关系。这些规则中的每一个的激活都根据转导相似性进行调整,并使用调整后的激活来获得新的决策。我们还对转导相似性本身提出了一些新的变体。 ud ud一旦数据量增加,我们就不仅限于在线学习设置,而且我们可以利用离线学习方案,该方案通常比在线学习表现更好。一种是因为样本排序的独立性以及针对所有样本的全局优化。我们使用生成方法来获取训练集之外的数据。具体而言,我们旨在通过在无标签数据的离线学习设置中纳入半监督学习来改进前面提到的TS模糊模型。我们使用Universum学习方法,并开发了一种称为UFuzzy的方法。该方法依赖于具有高不确定性(Universum集)的人工生成的示例,并且它调整了算法的成本函数以迫使决策边界接近Universum数据。我们能够证明UFuzzy分类器设计背后的假设,即Universum学习可以改善TS Fuzzy模型,并在两个以上的数据集和应用程序上实现了更高的性能。 ud ud随着数据量的增加,我们使用了最后一个场景,其中数据包括标记的数据和其他非标记的数据。对于半监督学习问题,此设置是最常见的设置之一。在我们这部分的工作中,我们旨在改善广泛接受的自我训练技术(及其后继帮助训练),这些技术都是基于常规分类器方法的元框架,但需要输出的概率表示,这可能很难获得如果是区分性分类器。因此,我们开发了一种新算法,该算法使用改进的主动学习技术按委员会查询(QbC)对来自未标记集的数据进行高确定性采样,然后将其嵌入到原始训练集中。我们的新方法使我们能够在一系列数据集和一系列分类器上实现更高的性能。 ud ud ud这三项工作通过逐步放宽对我们所使用的学习环境的约束来实现。尽管开发的主要动机是提高各种实际任务(手势识别,草图识别)的性能,但我们将工作表述为通用方法,以便可以在特定应用程序设置之外使用它们,这是唯一的限制基础数据会随着时间而变化。这些方法中的每一个都可以成功地独立存在。可以使用它们的最佳设置是一个学习问题,其中数据随着时间的推移而演化,并且可能离散化演化过程。 ud ud总体而言,这项工作对半监督学习和半监督学习领域都做出了重大贡献。模式识别。它提出了优于基准解决方案的最新技术,并为将来的研究开辟了新的可能性。

著录项

  • 作者

    Tencer Lukas;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号