首页> 外文学位 >A machine learning approach for designing DNA sequence assembly algorithms.
【24h】

A machine learning approach for designing DNA sequence assembly algorithms.

机译:一种用于设计DNA序列组装算法的机器学习方法。

获取原文
获取原文并翻译 | 示例

摘要

We present two separate algorithms for solving the DNA sequence assembly problem. The sequence assembly problem is the reconstruction of a large sequence of DNA from a set of subsequences called fragments. Fragments are created by breaking, at random intervals, copies of the original DNA sequence. This creates a system of fragments in which many of the fragments overlap with each other. Identifying these overlapping fragments is the key to reforming the original strand.; The first algorithm first identifies a “correct” series of fragment merges which would result in producing the original sample from which they were obtained. It enters each series into a database of solutions, which is then used to sequence DNA different than those used to create the database.; The second algorithm uses a k-mer based approach to identifying overlapping regions in fragments. The method is an improvement over the first algorithm in two ways: (1) it is designed to sequence real fragments, which are different in composition from simulated fragments; (2) it can be used to sequence much longer strands of DNA.; For both algorithms, parameters of computation are learned through experimentation with sequences of previously assembled DNA. Our experiments show that the parameters of computation generated by learning on a set of DNAs can be used to successfully sequence a separate set of DNA sequences.
机译:我们提出了两种单独的算法来解决DNA序列组装问题。序列组装问题是从一组称为片段的子序列中重建大的DNA序列。通过随机破坏原始DNA序列的副本来创建片段。这创建了一个片段系统,其中许多片段彼此重叠。识别这些重叠的片段是改造原始链的关键。第一种算法首先识别“正确”的片段合并系列,这将导致产生原始样本,并从中获得原始片段。它将每个系列输入到溶液数据库中,然后将其用于测序不同于用于创建数据库的DNA的DNA。第二种算法使用基于 k -mer的方法来识别片段中的重叠区域。该方法是对第一种算法的两个方面的改进:(1)设计用于对真实片段进行排序,这些片段的组成与模拟片段不同; (2)它可用于测序更长的DNA链。对于这两种算法,都是通过对先前组装的DNA序列进行实验来学习计算参数的。我们的实验表明,通过学习一组DNA生成的计算参数可用于成功测序一组单独的DNA序列。

著录项

  • 作者

    Lim, Darren Troy.;

  • 作者单位

    Rensselaer Polytechnic Institute.;

  • 授予单位 Rensselaer Polytechnic Institute.;
  • 学科 Computer Science.; Biology Genetics.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 95 p.
  • 总页数 95
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;遗传学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号