首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data
【24h】

A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data

机译:从时间序列数据中学习布尔网络时,样本复杂度和计算复杂度之间需要进行权衡

获取原文
获取原文并翻译 | 示例

摘要

A key problem in molecular biology is to infer regulatory relationships between genes from expression data. This paper studies a simplified model of such inference problems in which one or more Boolean variables, modeling, for example, the expression levels of genes, each depend deterministically on a small but unknown subset of a large number of Boolean input variables. Our model assumes that the expression data comprises a time series, in which successive samples may be correlated. We provide bounds on the expected amount of data needed to infer the correct relationships between output and input variables. These bounds improve and generalize previous results for Boolean network inference and continuous-time switching network inference. Although the computational problem is intractable in general, we describe a fixed-parameter tractable algorithm that is guaranteed to provide at least a partial solution to the problem. Most interestingly, both the sample complexity and computational complexity of the problem depend on the strength of correlations between successive samples in the time series but in opposing ways. Uncorrelated samples minimize the total number of samples needed while maximizing computational complexity; a strong correlation between successive samples has the opposite effect. This observation has implications for the design of experiments for measuring gene expression.
机译:分子生物学中的一个关键问题是从表达数据推断基因之间的调控关系。本文研究了此类推论问题的简化模型,其中一个或多个布尔变量(例如,对基因的表达水平进行建模)各自确定性地依赖于大量布尔输入变量的一小部分但未知的子集。我们的模型假设表达数据包括一个时间序列,在该时间序列中可能会关联到连续的样本。我们提供了推断输出变量和输入变量之间正确关系所需的预期数据量的界限。这些界限改进并归纳了布尔网络推断和连续时间交换网络推断的先前结果。尽管计算问题通常是棘手的,但我们描述了一种固定参数的可处理算法,该算法可保证至少提供该问题的部分解决方案。最有趣的是,问题的样本复杂度和计算复杂度都取决于时间序列中相继样本之间相关性的强度,但方式相反。不相关的样本将所需的样本总数减至最少,同时使计算复杂度最大化。连续样本之间的强相关具有相反的效果。该观察结果对测量基因表达的实验设计具有启示意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号