【24h】

Clustering - What Both Theoreticians and Practitioners are Doing Wrong

机译:聚类 - 科学家和从业者都做错了什么

获取原文

摘要

Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowadays. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. The first challenge I address is model selection - how should a user pick an appropriate clustering tool for a given clustering problem, and how should the parameters of such an algorithmic tool be tuned? In contrast with other common computational tasks, for clustering, different algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool selection for a given clustering task. I argue the severity of this problem and describe some recent proposals aiming to address this crucial lacuna.
机译:无监督的学习被广泛认为是现在机器学习最重要的挑战之一。然而,尽管每年发表的该主题报告数百篇论文,但目前的理论理解和这种任务的实践实施,特别是聚类,是非常基本的。此注释侧重于聚类。第一挑战I地址是模型选择 - 用户应该如何为给定的群集问题选择适当的聚类工具,并且应该如何调整此类算法工具的参数?与其他常见的计算任务相比,用于聚类,不同的算法通常会产生众所周知的不同结果。因此,聚类算法的选择可以在输出聚类解决方案的有用性中起到至关重要的作用。但是,目前没有用于给定群集任务的聚类工具选择没有任何方法指导。我争辩说这个问题的严重性,并描述了一些旨在解决这一关键空格的建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号