首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Multi-label classification via incremental clustering on an evolving data stream
【24h】

Multi-label classification via incremental clustering on an evolving data stream

机译:通过在不断发展的数据流上的增量聚类来多标签分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

With the advancement of storage and processing technology, an enormous amount of data is collected on a daily basis in many applications. Nowadays, advanced data analytics have been used to mine the collected data for useful information and make predictions, contributing to the competitive advantages of companies. The increasing data volume, however, has posed many problems to classical batch learning systems, such as the need to retrain the model completely with the newly arrived samples or the impracticality of storing and accessing a large volume of data. This has prompted interest on incremental learning that operates on data streams. In this study, we develop an incremental online multi-label classification (OMLC) method based on a weighted clustering model. The model is made to adapt to the change of data via the decay mechanism in which each sample's weight dwindles away over time. The clustering model therefore always focuses more on newly arrived samples. In the classification process, only clusters whose weights are greater than a threshold (called mature clusters) are employed to assign labels for the samples. In our method, not only is the clustering model incrementally maintained with the revealed ground truth labels of the arrived samples, the number of predicted labels in a sample are also adjusted based on the Hoeffding inequality and the label cardinality. The experimental results show that our method is competitive compared to several well-known benchmark algorithms on six performance measures in both the stationary and the concept drift settings. (C) 2019 Elsevier Ltd. All rights reserved.
机译:随着存储和处理技术的进步,在许多应用中每天都会收集大量数据。如今,高级数据分析已被用于挖掘收集的数据以获取有用的信息并进行预测,有助于公司的竞争优势。然而,数据量的增加对古典批量学习系统构成了许多问题,例如需要与新近到达的样本或存储和访问大量数据的不切实性来重写模型的需要。这促使对数据流运行的增量学习兴趣。在本研究中,我们在基于加权群集模型中开发一个增量的在线多标签分类(OMLC)方法。通过衰变机制使模型适应数据的变化,其中每个样本的重量随着时间的推移而被延迟。因此,聚类模型始终更多地关注新到达的样本。在分类过程中,仅采用重量大于阈值(称为成熟集群)的群集来为样本分配标签。在我们的方法中,不仅具有到达样本的显示的地面真理标签的逐步维护的聚类模型,还基于Hoeffd的不平等和标签基数来调整样本中的预测标签的数量。实验结果表明,与静统计和概念漂移设置中的六种性能措施相比,我们的方法与若干知名的基准算法相比是竞争力的。 (c)2019年elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号