首页> 外文期刊>Mobile networks & applications >A Cautionary Tale for Machine Learning Design: why we Still Need Human-Assisted Big Data Analysis
【24h】

A Cautionary Tale for Machine Learning Design: why we Still Need Human-Assisted Big Data Analysis

机译:用于机器学习设计的警告故事:为什么我们仍需要人工辅助的大数据分析

获取原文
获取原文并翻译 | 示例

摘要

Supervised Machine Learning (ML) requires that smart algorithms scrutinize a very large number of labeled samples before they can make right predictions. And this is not always true either. In our experience, in fact, a neural network trained with a huge database comprised of over fifteen million water meter readings had essentially failed to predict when a meter would malfunctioneed disassembly based on a history of water consumption measurements. With a second step, we developed a methodology, based on the enforcement of a specialized data semantics, that allowed us to extract only those samples for training that were not noised by data impurities. With this methodology, we re-trained the neural network up to a prediction accuracy of over 80%. Yet, we simultaneously realized that the new training dataset was significantly different from the initial one in statistical terms, and much smaller, as well. We had reached a sort of paradox: We had alleviated the initial problem with a better interpretable model, but we had changed the replicated form of the initial data. To reconcile that paradox, we further enhanced our data semantics with the contribution of field experts. This has finally led to the extrapolation of a training dataset truly representative of regular/defective water meters and able to describe the underlying statistical phenomenon, while still providing an excellent prediction accuracy of the resulting classifier. At the end of this path, the lesson we have learnt is that a human-in-the-loop approach may significantly help to clean and re-organize noised datasets for an empowered ML design experience.
机译:监督机器学习(ML)要求智能算法在可以做出正确的预测之前仔细审查大量标记的样本。而这并不总是如此。在我们的经验中,实际上,用超过一百万个水表读数的巨大数据库接受了一个训练的神经网络,基本上未能预测仪表故障/需要拆卸时拆卸,基于水消耗测量的历史。通过第二步,我们根据专门的数据语义的执行,开发了一种方法,使我们只能提取这些样本进行数据杂质而未发出的培训。通过这种方法,我们将神经网络重新培训到预测准确性超过80%。然而,我们同时意识到,新的训练数据集与统计术语中的最初一个初始培训,以及更小的。我们达到了一种悖论:我们缓解了更好的可解释模型的初始问题,但我们已更改初始数据的复制形式。为了协调那个悖论,我们进一步加强了我们的数据语义,并通过现场专家的贡献。这最终导致了真正代表普通/有缺陷的水表的训练数据集的外推,能够描述潜在的统计现象,同时仍提供所得分类器的出色预测精度。在这条路径结束时,我们所知的课程是,循环方法可能会有助于清洁和重新组织出现授权的ML设计经验的发音数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号