首页> 外文OA文献 >Using visualization, variable selection and feature extraction to learn from industrial data
【2h】

Using visualization, variable selection and feature extraction to learn from industrial data

机译:使用可视化,变量选择和特征提取从工业数据中学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Although the engineers of industry have access to process data, they seldom use advanced statistical tools to solve process control problems. Why this reluctance? I believe that the reason is in the history of the development of statistical tools, which were developed in the era of rigorous mathematical modelling, manual computation and small data sets. This created sophisticated tools. The engineers do not understand the requirements of these algorithms related, for example, to pre-processing of data. If algorithms are fed with unsuitable data, or parameterized poorly, they produce unreliable results, which may lead an engineer to turn down statistical analysis in general.This thesis looks for algorithms that probably do not impress the champions of statistics, but serve process engineers. This thesis advocates three properties in an algorithm: supervised operation, robustness and understandability. Supervised operation allows and requires the user to explicate the goal of the analysis, which allows the algorithm to discover results that are relevant to the user. Robust algorithms allow engineers to analyse raw process data collected from the automation system of the plant. The third aspect is understandability: the user must understand how to parameterize the model, what is the principle of the algorithm, and know how to interpret the results.The above criteria are justified with the theories of human learning. The basis is the theory of constructivism, which defines learning as construction of mental models. Then I discuss the theories of organisational learning, which show how mental models influence the behaviour of groups of persons. The next level discusses statistical methodologies of data analysis, and binds them to the theories of organisational learning. The last level discusses individual statistical algorithms, and introduces the methodology and the algorithms proposed by this thesis. This methodology uses three types of algorithms: visualization, variable selection and feature extraction. The goal of the proposed methodology is to reliably and understandably provide the user with information that is related to a problem he has defined interesting.The above methodology is illustrated by an analysis of an industrial case: the concentrator of the Hitura mine. This case illustrates how to define the problem with off-line laboratory data, and how to search the on-line data for solutions. A major advantage of algorithmic study of data is efficiency: the manual approach reported in the early took approximately six man months; the automated approach of this thesis created comparable results in few weeks.
机译:尽管行业工程师可以访问过程数据,但是他们很少使用高级统计工具来解决过程控制问题。为什么这么不情愿?我认为原因在于统计工具的发展历史,统计工具是在严格的数学建模,手动计算和小数据集时代开发的。这创建了复杂的工具。工程师不了解这些算法的要求,例如与数据预处理有关的要求。如果算法输入的数据不合适或参数设置不正确,则会产生不可靠的结果,这可能会导致工程师普遍拒绝进行统计分析。本文寻找的算法可能不会打动统计学的拥护者,但会为过程工程师服务。本文提出了算法的三个特性:监督操作,鲁棒性和可理解性。有监督的操作允许并且要求用户阐明分析的目标,这允许算法发现与用户相关的结果。强大的算法使工程师能够分析从工厂自动化系统收集的原始过程数据。第三个方面是可理解性:用户必须了解如何对模型进行参数化,算法的原理是什么,并且必须知道如何解释结果。上述标准以人类学习的理论为依据。基础是建构主义理论,它把学习定义为心理模型的建构。然后,我讨论了组织学习的理论,这些理论显示了心理模型如何影响人群的行为。下一级别将讨论数据分析的统计方法,并将其与组织学习的理论绑定。最后一级讨论了个体统计算法,并介绍了本文提出的方法和算法。该方法使用三种类型的算法:可视化,变量选择和特征提取。所提出的方法的目的是可靠和可理解地向用户提供与他所定义的有趣问题有关的信息。通过对一个工业案例的分析:Hitura矿的选矿厂来说明上述方法。本案例说明了如何用离线实验室数据来定义问题,以及如何在在线数据中寻找解决方案。数据算法研究的一个主要优势是效率:早期报道的人工方法大约花费了六个月的工时。本文的自动化方法在几周内创造了可比的结果。

著录项

  • 作者

    Laine Sampsa;

  • 作者单位
  • 年度 2003
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号