【24h】

DATA MINING, STRIP MINING AND OTHER HISTORICAL DATA HIGH JINX

机译:数据挖掘,条带挖掘和其他历史数据高级JINX

获取原文
获取原文并翻译 | 示例

摘要

When one decides to embark on a data mining project there are two key tasks that must be completed at the very beginning: clearly defining the goals and expectations of the project, and preparing the data properly before any data mining or modeling is performed. When data mining with historical data sets one needs to understand several aspects of the data: 晇ariable data types, data structures, existence of potential outliers, equipment used at each operation, relationships, interactions and correlations between categorical and continuous variables, relationships between predictor and response variables, effects over time, basic assumptions about the distributions of the variables and data integrity. Using SAS Institute's JMP statistical analysis software package, several solutions will be proposed to address these data issues. The following techniques are presented: making multiple scatterplots to highlight potential outliers, constructing frequency tables to highlight missing cells and small sample sizes, using date variables to compare tools running simultaneously, changing color and symbol type to add dimensionality to the data, concatenating categorical variables to look for interactions, constructing histograms and probability plots to check data distributions, and using summary sample size tables to check data integrity. These techniques will enable the analyst to make sound, realistic and statistically correct decisions when data mining with large historical data sets.
机译:当决定开始数据挖掘项目时,一开始必须完成两项关键任务:明确定义项目的目标和期望,以及在执行任何数据挖掘或建模之前正确准备数据。当使用历史数据集进行数据挖掘时,需要了解数据的多个方面:可变数据类型,数据结构,潜在异常值的存在,每次操作使用的设备,分类变量和连续变量之间的关系,相互作用和相关性,预测变量之间的关系和响应变量,随时间变化的影响,有关变量分布的基本假设以及数据完整性。使用SAS Institute的JMP统计分析软件包,将提出几种解决方案来解决这些数据问题。介绍了以下技术:进行多个散点图以突出显示潜在的异常值;构建频率表以突出显示丢失的单元格和较小的样本量;使用日期变量来比较同时运行的工具;更改颜色和符号类型以向数据添加维数;将分类变量进行级联寻找相互作用,构建直方图和概率图以检查数据分布,并使用汇总样本大小表检查数据完整性。当使用大型历史数据集进行数据挖掘时,这些技术将使分析师能够做出合理,现实和统计上正确的决策。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号