【24h】

Mining Polymorphic SSRs from Individual Genome Sequences

机译:从单个基因组序列中挖掘多态SSR

获取原文
获取原文并翻译 | 示例

摘要

Simple Sequence Repeats (SSRs) are abundant in genome sequences and become popular biomarkers for genetic studies. Several SSRs were proved essential for gene regulation, abnormal repeat patterns of these critical SSRs might cause lethal diseases. The Next Generation Sequencing technologies provided efficient approaches for SSR polymorphism detection. However, inefficient and manually curated processes were unavoidable for identifying SSR markers in previous approaches. An automatic and efficient system for detecting polymorphic SSRs at genomic scales was proposed without manual curated and examining works. The workflow accepted multiple NGS sequencing datasets and started with assembly by de novo or reference mapping approaches. The consensus sequences were then obtained from previously assembled contigs, and calibrated coordinates in each individual contig were aligned according to the selected reference sequences. Next, the mining SSR mechanism was designed to retrieve all potential polymorphic SSRs whenever the circumstances were occurred due to insertion or deletion mechanisms. The 1000 genomes Trio projects were employed as the testing sequence datasets, and the CODIS SSR markers and 9 well known disease-related SSR motifs were verified as the testing targets. The results have shown the proposed method could identify the known polymorphic SSRs as well as novel SSR markers when there was no sequencing or mapping errors within the consensus sequences. The proposed method employed NGS technologies to identify SSR polymorphism and accelerate related researches, which facilitates novel SSR biomarker selection and regulatory elements discovery.
机译:简单序列重复序列(SSR)在基因组序列中很丰富,并成为遗传研究中流行的生物标记。事实证明,一些SSR对于基因调控至关重要,这些关键SSR的异常重复模式可能导致致命的疾病。下一代测序技术为SSR多态性检测提供了有效的方法。但是,在以前的方法中,无法有效地手工识别SSR标记是无效的。提出了一种自动高效的系统,用于检测基因组规模的多态性SSR,而无需人工策划和检查工作。该工作流程接受了多个NGS测序数据集,并从头开始进行了从头组装或参考映射的方法。然后从先前组装的重叠群中获得共有序列,并根据选择的参考序列比对每个单独重叠群中的校准坐标。接下来,将挖掘SSR机制设计为在由于插入或删除机制而发生情况时检索所有潜在的多态SSR。将1000个基因组Trio项目用作测试序列数据集,并验证了CODIS SSR标记和9个众所周知的疾病相关SSR主题作为测试目标。结果表明,当在共有序列内没有测序或作图错误时,该方法可以识别出已知的多态性SSR以及新颖的SSR标记。所提出的方法利用NGS技术鉴定SSR多态性并加速相关研究,这有助于新颖的SSR生物标志物选择和调控元件的发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号