Handling of missing data to improve the mining of large feed databases.

Maroto-Molina F.; Gomez-Cabrera A.; Guerrero-Ginel J. E.; Garrido-Varo A.; Sauvant D.; Tran G.; Heuze V.; Perez-Marin D. C.

首页> 外文期刊>Journal of Animal Science >Handling of missing data to improve the mining of large feed databases.

【24h】

Handling of missing data to improve the mining of large feed databases.

机译：处理丢失的数据以改善大型提要数据库的挖掘。

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Feed databases often have missing data. Despite their potentially major effect on data analysis (e.g., as a source of biased results and loss of statistical power), database managers and nutrition researchers have paid little attention to missing data. This study evaluated various methods of handling missing data using mining outputs from a database containing data on chemical composition and nutritive value for 18,864 alfalfa samples. A complete reference dataset was obtained comprising the 2,303 cases with no missing data for the attributes CP, crude fiber (CF), NDF, ADF and ADL. This dataset was used to simulate 2 types of missing data (at random and not at random), each with 2 loss intensities (33 and 66%), thus yielding a total of 4 incomplete datasets. Missing data from these datasets were handled using 2 deletion methods and 4 imputation methods, and outputs in terms of the identification and typing of alfalfa (using ANOVA and descriptive statistics) and of correlations between attributes (using regressions) were compared with outputs from the complete dataset. Imputation methods, particularly model-based versions, were found to perform better than deletion methods in terms of maximizing information use and minimizing bias although the extent of differences between methods depended on the type of missing data. The best approximation to the uncertainty value was provided by multiple imputation methods. It was concluded that the choice of the most suitable method for handling missing data depended both on the type of missing data and on the purpose of data analysis.

机译：Feed数据库通常缺少数据。尽管数据库管理人员和营养研究人员可能对数据分析产生重大影响（例如，作为有偏见的结果和失去统计能力的来源），但他们对丢失的数据很少关注。这项研究评估了使用数据库中包含18864种苜蓿样品的化学成分和营养价值数据的数据库的挖掘输出来处理丢失数据的各种方法。获得了一个完整的参考数据集，包括2,303个案例，其中没有缺少CP，粗纤维（CF），NDF，ADF和ADL属性的数据。该数据集用于模拟2种类型的丢失数据（随机和非随机），每种类型都有2种丢失强度（33％和66％），因此总共产生4个不完整的数据集。这些数据集中的缺失数据使用2种删除方法和4种插补方法进行处理，并将苜蓿的鉴定和分型（使用ANOVA和描述性统计数据）和属性之间的相关性（使用回归）的输出与完整数据的输出进行比较。数据集。发现在最大程度地利用信息和最小化偏差方面，插补方法（尤其是基于模型的方法）的性能要优于删除方法，尽管方法之间的差异程度取决于丢失的数据的类型。不确定度值的最佳近似由多种插补方法提供。结论是，选择最合适的方法来处理丢失的数据取决于丢失的数据的类型和数据分析的目的。

著录项

来源
《Journal of Animal Science》 |2013年第1期|共10页
作者
Maroto-Molina F.; Gomez-Cabrera A.; Guerrero-Ginel J. E.; Garrido-Varo A.; Sauvant D.; Tran G.; Heuze V.; Perez-Marin D. C.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类动物学;
关键词
data analysis; feed database; feed table; imputation; missing data; prediction of nutritive value;

机译：数据分析;饲料数据库;饲料表;投入量;缺失数据;营养价值预测;

相似文献

外文文献
中文文献
专利

1. Handling of missing data to improve the mining of large feed databases. [J] . Maroto-Molina F., Gomez-Cabrera A., Guerrero-Ginel J. E., Journal of Animal Science . 2013,第1期

机译：处理丢失的数据以改善大型提要数据库的挖掘。
2. Handling of missing data to improve the mining of large feed databases [J] . Pérez-Marín?D. C. Journal of animal science . 2013,第1期

机译：处理缺失数据以改善大型提要数据库的挖掘
3. Data pre-processing to improve the mining of large feed databases. [J] . Maroto-Molina F., Gomez-Cabrera A., Guerrero-Ginel J. E., Animal . 2013,第7期

机译：数据预处理可改善大型供稿数据库的挖掘。
4. Respecting Data Privacy in Educational Data Mining: An Approach to the Transparent Handling of Student Data and Dealing with the Resulting Missing Value Problem [C] . Alexander Askinadze, Stefan Conrad IEEE International Conference on Enabling Technologies . 2018

机译：尊重教育数据挖掘中的数据隐私：学生数据透明处理和处理所产生的缺失值问题的方法
5. Data mining techniques for handling a missing data problem. [D] . Siripitayananon, Punnee. 2002

机译：数据挖掘技术，用于处理丢失的数据问题。
6. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies [O] . Kieu Trinh Do, Simone Wahl, Johannes Raffler, -1

机译：无目标的基于MS的代谢组学数据中缺失值的表征和缺失数据处理策略的评估
7. Enhancing association rules algorithms for mining distributed databases. Integration of fast BitTable and multi-agent association rules mining in distributed medical databases for decision support. [O] . Abdo Walid Adly Atteya 2012

机译：增强用于挖掘分布式数据库的关联规则算法。快速BitTable和多代理关联规则挖掘在分布式医疗数据库中的集成，以提供决策支持。

Handling of missing data to improve the mining of large feed databases.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅