首页> 外文会议>Institute of Electrical and Electronics Engineers Symposium on Bioinformatics and Bioengineering >Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering
【24h】

Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering

机译:检测蛋白质 - 蛋白质相互作用与迭代采样和基于模型的聚类的实验噪声

获取原文

摘要

One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper, we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.
机译:当前分子生物学中最重要的问题之一是建立蛋白质 - 蛋白质相互作用的确切网络。最近开发的高通量实验技术积累了大量的蛋白质蛋白质相互作用数据,但众所周知,数据可靠性尚未以令人满意的水平达到。在本文中,我们试图通过使用随机模型作为其子程序的学习来计算通过迭代采样方法计算蛋白质 - 蛋白质相互作用数据中可能包含的实验误差或噪声。该方法重复两步的选择可以被视为非噪声的示例,并交替使用所选实施例训练组件算法。选择噪声候选者作为具有先前获得的随机模型计算的最小平均似然的示例。我们通过使用合成和实际数据集进行了用其他两种方法对方法进行了评估的方法。我们通过使用故意添加噪声的中​​型和大型合成数据集来检查噪声和数据大小的效果。通过中型合成数据集获得的结果表明,对于更高的噪声比,方法和另外两个方法之间的性能差异的显着性水平更加明显。进一步的实验表明,这种实验发现也是如此的大规模数据集。通过使用真正的蛋白质 - 蛋白质相互作用数据集进一步通过实验进一步证实该方法的性能优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号