首页> 外文会议>ASME International Design Engineering Technical Conferences >EVALUATING SAMPLING METHODS FOR REUSING KNOWLEDGE FROM LARGE AND ILL-STRUCTURED QUALITATIVE DATA SETS
【24h】

EVALUATING SAMPLING METHODS FOR REUSING KNOWLEDGE FROM LARGE AND ILL-STRUCTURED QUALITATIVE DATA SETS

机译:评估用于重用大型和结构定性数据集的知识的采样方法

获取原文

摘要

The desire to use ever growing qualitative data sets of user generated content in the engineering design process in a computationally effective manner makes it increasingly necessary to draw representative samples. This work investigated the ability of alternative sampling algorithms to draw samples with conformance to characteristics of the original data set. Sampling methods investigated included: random sampling, interval sampling, fixed-increment (or systematic) sampling method, and stratified sampling. Data collected through the Vehicle Owner's Questionnaire, a survey administered by the U.S. National Highway Traffic Safety Administration, is used as a case study throughout this paper. The paper demonstrates that existing statistical methods may be used to evaluate goodness of fit for samples drawn from large bodies of qualitative data. Evaluation of goodness of fit not only provides confidence that a sample is representative of the data set from which it is drawn, but also yields valuable real-time feedback during the sampling process. This investigation revealed two interesting and counterintuitive trends in sampling algorithm performance. The first is that larger sample sizes do not necessarily lead to improved goodness of fit. The second is that depending on the details of implementation, data cleansing may degrade performance of data sampling algorithms rather than improving it. This work illustrates the importance of aligning sampling procedures to data structures and validating the conformance of samples to characteristics of the larger data set to avoid drawing erroneous conclusions based on unexpectedly biased samples of data.
机译:的愿望,永远用在计算有效的方式不断扩大的工程设计过程中用户生成内容的定性数据集使得它越来越有必要提请代表性的样本。这项工作研究了可选的采样算法与符合绘制样品的原始数据集的特性的能力。研究包括采样方式:随机抽样,抽样间隔,固定增量(或系统)的采样方法,和分层抽样。通过车主的问卷调查,由美国国家公路交通安全管理局管理的调查数据收集,作为整篇文章的案例研究。该文件表明,现有的统计方法可以用来评估拟合优度从定性数据的大型机构抽取的样本。拟合优度的评价不仅提供了信心,一个样本代表从它绘制的数据集,但也产生在采样过程中宝贵的实时反馈。本次调查在抽样算法的性能揭示了两个有趣的和违反直觉的趋势。第一个是大的样本大小不一定导致改善拟合优度。第二个是,取决于实现的细节,数据清洗可能会降低的数据采样算法的性能,而不是改善它。这项工作示出对准抽样程序以数据结构和验证样品的一致性,以较大的数据集,以避免绘图基于数据的意外偏压样本错误结论的特征的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号