DATA MINING, STRIP MINING AND OTHER HISTORICAL DATA HIGH JINX

机译：数据挖掘，剥离挖掘等历史数据高Jinx

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

When one decides to embark on a data mining project there are two key tasks that must be completed at the very beginning: clearly defining the goals and expectations of the project, and preparing the data properly before any data mining or modeling is performed. When data mining with historical data sets one needs to understand several aspects of the data: 晇ariable data types, data structures, existence of potential outliers, equipment used at each operation, relationships, interactions and correlations between categorical and continuous variables, relationships between predictor and response variables, effects over time, basic assumptions about the distributions of the variables and data integrity. Using SAS Institute's JMP statistical analysis software package, several solutions will be proposed to address these data issues. The following techniques are presented: making multiple scatterplots to highlight potential outliers, constructing frequency tables to highlight missing cells and small sample sizes, using date variables to compare tools running simultaneously, changing color and symbol type to add dimensionality to the data, concatenating categorical variables to look for interactions, constructing histograms and probability plots to check data distributions, and using summary sample size tables to check data integrity. These techniques will enable the analyst to make sound, realistic and statistically correct decisions when data mining with large historical data sets.

机译：当一个决定开始数据挖掘项目时，有两个必须在开始时完成的两个关键任务：清楚地定义项目的目标和期望，并在执行任何数据挖掘或建模之前正确准备数据。当数据挖掘与历史数据集合中，需要了解数据的若干方面：◦可见数据类型，数据结构，潜在异常值的存在，在每个操作，关系，相互作用和分类和连续变量之间的相关性，预测器之间的关系。和响应变量，随着时间的推移效果，关于变量分布的基本假设和数据完整性。使用SAS Institute的JMP统计分析软件包，将提出几种解决方案来解决这些数据问题。提出了以下技术：使多个散点图突出显示潜在的异常值，构建频率表以突出缺少缺失的单元格和小样本大小，使用日期变量进行比较同时运行的工具，更改颜色和符号类型以向数据添加维度，串联分类变量要查找相互作用，构建直方图和概率图以检查数据分布，并使用摘要样本大小表来检查数据完整性。这些技术将使分析师能够在与大型历史数据集的数据挖掘时进行声音，现实和统计上正确的决策。

著录项

来源
《International conference on modeling and analysis of semiconductor manufacturing》|2000年||共7页
会议地点
作者
Mark A. Sorell;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类材料;
关键词
data mining; project goals; semiconductor manufacturing; JMP software; large data sets; outliers; data preparation;

机译：数据挖掘;项目目标;半导体制造;JMP软件;大数据集;异常值;数据准备;

相似文献

外文文献
中文文献
专利

1. Six Sigma 4.0:Data Mining als unterstutzende Technologie im Null-Fehler-Management:Data Mining als unterstutzende Technologie im Null-Fehler-Management:Data Mining als unterstutzende Technologie im Null-Fehler-Management [J] . Franziska Schafer, Andreas Mayr, Andreas Hess, Zeitschrift fur Wirtschaftlichen Fabrikbetrieb . 2019,第3期

机译：六个Sigma 4.0：数据挖掘为空错误管理中的支持技术：数据挖掘作为空位错误管理中的支持技术：数据挖掘作为空位错误管理中的支持技术
2. Six Sigma 4.0:Data Mining als unterstutzende Technologie im Null-Fehler-Management:Data Mining als unterstutzende Technologie im Null-Fehler-Management:Data Mining als unterstutzende Technologie im Null-Fehler-Management [J] . Franziska Schafer, Andreas Mayr, Andreas Hess, Zeitschrift fur Wirtschaftlichen Fabrikbetrieb . 2019,第3期

机译：六个Sigma 4.0：数据挖掘为空错误管理中的支持技术：数据挖掘作为空位错误管理中的支持技术：数据挖掘作为空位错误管理中的支持技术
3. Inductive data mining: automatic generation of decision trees from data for QSAR modelling and process historical data analysis [J] . Chao Y. Ma, Frances V. Buontempo, Xue Z. Wang International Journal of Modelling, Identification and Control . 2011,第1a2期

机译：归纳数据挖掘：从数据自动生成决策树以进行QSAR建模和过程历史数据分析
4. DATA MINING, STRIP MINING AND OTHER HISTORICAL DATA HIGH JINX [C] . Mark A. Sorell International Conference on Modeling and Analysis of Semiconductor Manufacturing (MASM 2000), May 10-12, 2000, Tempe, Arizona . 2000

机译：数据挖掘，条带挖掘和其他历史数据高级JINX
5. Industrial Applications of Data Mining Engineering Effort Forecasting based on Mining and Analysis of Patterns in Historical Project Execution Data. [D] . Bhattacharya, Indrani. 2013

机译：基于历史项目执行数据的挖掘和模式分析的数据挖掘工程工作量预测的工业应用。
6. Real Alerts and Artifact Classification in Archived Multi-signal Vital Sign Monitoring Data—Implications for Mining Big Data — Implications for Mining Big Data [O] . Marilyn Hravnak, Lujie Chen, Artur Dubrawski, -1

机译：归档的多信号生命体征监测数据中的真实警报和伪像分类—挖掘大数据的含义—挖掘大数据的含义
7. Stripping customers' feedback on hotels through data mining: The case of Las Vegas Strip [O] . Sérgio Moro, Paulo Rita, Joana Coelho 2017

机译：通过数据挖掘剥离客户的反馈：拉斯维加斯地带的案例
8. Data Mining of Historical Human Data to Assess the Risk of Injury Due to Dynamic Loads. [R] . Wells, J., Somers, J. T., Newby, N., 2014

机译：历史人类数据的数据挖掘评估动态负荷损伤的风险。

DATA MINING, STRIP MINING AND OTHER HISTORICAL DATA HIGH JINX

摘要

著录项

相似文献

相关主题

期刊订阅