首页> 外文OA文献 >Pattern analysis approach reveals restriction enzyme cutting abnormalities and other cDNA library construction artifacts using raw EST data
【2h】

Pattern analysis approach reveals restriction enzyme cutting abnormalities and other cDNA library construction artifacts using raw EST data

机译:模式分析方法使用原始EST数据揭示了限制性内切酶切割异常和其他cDNA文库构建伪像

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be "unclean". Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction. Results: After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3'-end terminal structures in designated 5'-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/) using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are "unclean" or abnormal, all of which could be cleaned or filtered by AFST. Conclusions: cDNA terminal pattern analysis, as implemented in the AFST software tool, can be utilized to reveal wet-lab errors such as restriction enzyme cutting abnormities and chimeric EST sequences, detect various data abnormalities embedded in existing Sanger EST datasets, improve the accuracy of identifying and extracting bona fide cDNA inserts from raw ESTs, and therefore greatly benefit downstream EST-based applications.
机译:背景:表达序列标签(EST)序列广泛用于诸如基因组注释,基因发现和基因表达研究之类的应用中。但是,已证明某些GenBank dbEST序列“不干净”。在原始EST中鉴定cDNA末端/末端及其结构不仅有助于数据质量控制和转录末端的准确描绘,而且还使我们进一步了解了在湿实验室程序中用于cDNA文库构建的潜在数据异常/错误来源。结果:在分析了总共309,976个原始taeda taeda EST之后,我们发现了cDNA末端的许多不同变异,其中一些被证明是湿实验室假象的良好指示,并通过其cDNA末端结构模式对每个原始EST进行了表征。与预期的模式相反,许多EST表现出复杂和/或异常的模式,这些模式代表潜在的湿实验室错误,例如:一种或两种限制酶不能切割质粒载体;限制酶不能在正确的位置切割载体;将两个cDNA插入片段插入单个载体中;插入多个和/或串联的适配器/链接器;在指定的5'末端序列中存在3'末端末端结构,反之亦然;等等。通过仔细检查这些假象,可以用我们的方法清理或过滤掉许多由常规生物信息学管道或工具存入公共数据库的有问题的EST。我们使用模式分析方法开发了用于EST的异常过滤和序列修剪的软件工具(AFST,http://code.google.com/p/afst/)。为了将AFST与其他将EST提交到dbEST的管道进行比较,我们重新处理了230783 taus taeda和38709 Arachis hypogaea GenBank EST。我们发现7.4%的针叶松和29.2%的Arachis hypogaea GenBank EST是“不清洁的”或异常的,所有这些都可以通过AFST进行清洁或过滤。结论:AFST软件工具中实施的cDNA末端模式分析可用于揭示湿实验室错误,例如限制性内切酶切割异常和嵌合EST序列,检测嵌入在现有Sanger EST数据集中的各种数据异常,从而提高检测准确性。从原始EST中鉴定和提取真正的cDNA插入片段,因此大大有利于基于EST的下游应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号