首页> 外文期刊>Bioprocess and Biosystems Engineering >A heuristic approach to handling missing data in biologics manufacturing databases
【24h】

A heuristic approach to handling missing data in biologics manufacturing databases

机译:一种用于处理生物制剂生产数据库中缺失数据的启发式方法

获取原文
获取原文并翻译 | 示例

摘要

The biologics sector has amassed a wealth of data in the past three decades, in line with the bioprocess development and manufacturing guidelines, and analysis of these data with precision is expected to reveal behavioural patterns in cell populations that can be used for making predictions on how future culture processes might behave. The historical bioprocessing data likely comprise experiments conducted using different cell lines, to produce different products and may be years apart; the situation causing inter-batch variability and missing data points to human- and instrument-associated technical oversights. These unavoidable complications necessitate the introduction of a pre-processing step prior to data mining. This study investigated the efficiency of mean imputation and multivariate regression for filling in the missing information in historical bio-manufacturing datasets, and evaluated their performance by symbolic regression models and Bayesian non-parametric models in subsequent data processing. Mean substitution was shown to be a simple and efficient imputation method for relatively smooth, non-dynamical datasets, and regression imputation was effective whilst maintaining the existing standard deviation and shape of the distribution in dynamical datasets with less than 30% missing data. The nature of the missing information, whether Missing Completely At Random, Missing At Random or Missing Not At Random, emerged as the key feature for selecting the imputation method.
机译:根据生物工艺开发和制造指南,生物制剂领域在过去的三十年中积累了大量数据,对这些数据的精确分析有望揭示细胞群体的行为模式,从而可以用于预测如何未来的文化进程可能会有所作为。历史生物处理数据可能包括使用不同细胞系进行的实验,以产生不同的产品,并且可能相隔数年;导致批间差异和缺少数据的情况将导致与人和仪器相关的技术监督。这些不可避免的复杂性使得必须在数据挖掘之前引入预处理步骤。这项研究调查了平均归因和多元回归填充历史生物制造数据集中缺失信息的效率,并通过符号回归模型和贝叶斯非参数模型在后续数据处理中评估了它们的性能。对于相对平滑的非动力学数据集,均值替代被证明是一种简单而有效的估算方法,回归估算是有效的,同时可在动态数据集中保留现有标准偏差和分布形状而丢失数据少于30%。丢失信息的性质,无论是完全随机丢失,随机丢失还是非随机丢失,已成为选择插补方法的关键特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号