In many important bioinformatics problems the data sets contain considerable redundancy due to the evolutionary processes which generate the data and biases in the data collection procedures. The standard practice in bioinformatics involves removing the redundancy such that there is no more than at most forty percent similarity between sequences in a data set. For small data sets this can dilute the already impoverished data beyond the boundary of practicality. One can choose to include all available data in the process by just ensuring that only the training and test samples have the required redundancy gap. However, this encourages overfitting of the model by exposure to a highly redundant training sets. We outline a process of multi-stage redundancy reduction, whereby the paucity of data can be effectively utilised without compromising the integrity of the model or the testing procedure.
机译:基于随机模糊可信度的多级混合系统最优冗余分配问题
机译:具有三模块冗余(TMR)技术的多级容错乘法器
机译:一种多级深度学习基于多尺度模型减少的算法
机译:多级冗余减少
机译:多阶段模式减少,实现无损图像压缩
机译:F-DCS:具有冗余减少算法的基于FMI的分布式CPS仿真框架
机译:一种多级深度学习算法,用于多尺度模型减少