...
首页> 外文期刊>International Journal of Approximate Reasoning >Combining labelled and unlabelled data in the design of pattern classification systems
【24h】

Combining labelled and unlabelled data in the design of pattern classification systems

机译:在模式分类系统的设计中结合标记和未标记的数据

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches and to analyse the behaviour of the learning system when using different ratios of labelled to unlabelled data. In this paper various methods for learning from labelled and unlabelled data are first discussed and categorised into one of three major groups: pre-labelling, post-labelling and semi-supervised approaches. Their generalised formal description and extensive experimental analysis is then provided. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results based on random selection of labelled samples show high variability and the performance of the final classifier is more dependent on how reliable the labelled data samples are rather than use of additional unlabelled data. In response to this finding three types of static (one-step) selection methods guided by a clustering information and various options of allocating a number of samples within clusters and their distributions have been proposed and analysed. A significant improvement compared to the random selection of the labelled samples have been observed when using these selective sampling techniques.
机译:在应用将来自未标记数据的知识整合到有监督的学习系统中的技术方面,已经引起了很多兴趣,但是在使用不同比例的已标记数据与未标记数据的比率时,已经进行了较少的工作来比较不同方法的有效性并分析学习系统的行为。在本文中,首先讨论了从标记和未标记的数据中学习的各种方法,并将其分类为三个主要组之一:预标记,后标记和半监督方法。然后提供了它们的广义形式描述和广泛的实验分析。实验结果表明,在未标记样品的支持下,构建分类器而不损害分类性能通常需要更少的标记数据。如果只有非常有限数量的标记数据可用,则基于随机选择的标记样本的结果将显示高可变性,并且最终分类器的性能将更多地取决于标记数据样本的可靠性,而不是使用其他未标记数据。响应于该发现,已经提出并分析了由聚类信息指导的三种类型的静态(单步)选择方法以及在聚类中分配多个样本及其分布的各种选择。使用这些选择性采样技术时,已观察到与标记样本的随机选择相比有显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号