Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements

Teresa M Creanza; David S Horner; Annarita DAddabbo; Rosalia Maglietta; Flavio Mignone; Nicola Ancona; Graziano Pesole

首页> 外文期刊>BMC Bioinformatics >Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements

【24h】

Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements

机译：统计评估蛋白质编码和非编码跨物种保守序列元素的判别特征

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background The identification of protein coding elements in sets of mammalian conserved elements is one of the major challenges in the current molecular biology research. Many features have been proposed for automatically distinguishing coding and non coding conserved sequences, making so necessary a systematic statistical assessment of their differences. A comprehensive study should be composed of an association study, i.e. a comparison of the distributions of the features in the two classes, and a prediction study in which the prediction accuracies of classifiers trained on single and groups of features are analyzed, conditionally to the compared species and to the sequence lengths. Results In this paper we compared distributions of a set of comparative and non comparative features and evaluated the prediction accuracy of classifiers trained for discriminating sequence elements conserved among human, mouse and rat species. The association study showed that the analyzed features are statistically different in the two classes. In order to study the influence of the sequence lengths on the feature performances, a predictive study was performed on different data sets composed of coding and non coding alignments in equal number and equally long with an ascending average length. We found that the most discriminant feature was a comparative measure indicating the proportion of synonymous nucleotide substitutions per synonymous sites. Moreover, linear discriminant classifiers trained by using comparative features in general outperformed classifiers based on intrinsic ones. Finally, the prediction accuracy of classifiers trained on comparative features increased significantly by adding intrinsic features to the set of input variables, independently on sequence length (Kolmogorov-Smirnov P-value ≤ 0.05). Conclusion We observed distinct and consistent patterns for individual and combined use of comparative and intrinsic classifiers, both with respect to different lengths of sequences/alignments and with respect to error rates in the classification of coding and non-coding elements. In particular, we noted that comparative features tend to be more accurate in the classification of coding sequences – this is likely related to the fact that such features capture deviations from strictly neutral evolution expected as a consequence of the characteristics of the genetic code.

机译：背景技术鉴定哺乳动物保守元件组中的蛋白质编码元件是当前分子生物学研究中的主要挑战之一。已经提出了许多特征来自动区分编码序列和非编码保守序列，因此有必要对其差异进行系统的统计评估。全面的研究应由关联研究（即比较两个类别中的特征的分布）和预测研究组成，该预测研究应在有条件的条件下分析在单个特征和特征组上训练的分类器的预测准确性种类和序列长度。结果在本文中，我们比较了一组比较特征和非比较特征的分布，并评估了用于区分人类，小鼠和大鼠物种中保守序列元素的分类器的预测准确性。关联研究表明，在两个类别中，所分析的特征在统计上是不同的。为了研究序列长度对特征性能的影响，对由编码和非编码比对组成的不同数据集进行了预测研究，这些数据集的数目相等且长度相等，平均长度递增。我们发现最有区别的特征是一个比较措施，表明每个同义位点同义核苷酸取代的比例。此外，通过使用比较特征训练的线性判别式分类器在基于内在分类器的综合分类器中的表现要好。最后，通过将固有特征添加到输入变量集上而与序列长度无关（Kolmogorov-Smirnov P值≤0.05），在比较特征上训练的分类器的预测准确性显着提高。结论我们观察到比较和固有分类器单独使用和组合使用的不同且一致的模式，既涉及序列/比对的不同长度，又涉及编码和非编码元素分类中的错误率。特别是，我们注意到比较特征在编码序列的分类中趋于更准确–这可能与以下事实有关，即这些特征捕获了由于遗传密码的特性而导致的与预期的严格中性进化的偏离。

著录项

来源
《BMC Bioinformatics》 |2009年第6期|共页
作者
Teresa M Creanza; David S Horner; Annarita DAddabbo; Rosalia Maglietta; Flavio Mignone; Nicola Ancona; Graziano Pesole;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. A CONSERVED ELEMENT IN THE PROTEIN-CODING SEQUENCE IS REQUIRED FOR NORMAL EXPRESSION OF REPLICATION-DEPENDENT HISTONE GENES IN DEVELOPING XENOPUS EMBRYOS [J] . Ficzycz A., Lele Z., Hurt MM., Developmental biology . 1997,第1期

机译：蛋白质表达序列中的保守元素对于发育中的非洲人胚胎的复制依赖的组蛋白基因的正常表达是必需的
2. A Conserved Element in the Protein-Coding Sequence Is Required for Normal Expression of Replication-Dependent Histone Genes in DevelopingXenopusEmbryos ☆ [J] . Andrew Ficzycz, Nikola K. Kaludov, Zsolt Lele, Developmental biology . 1997,第1期

机译：在非洲爪蟾中，复制依赖性组蛋白基因的正常表达需要蛋白质编码序列中的保守元素☆
3. The complete sequence of the mouse skeletal alpha-actin gene reveals several conserved and inverted repeat sequences outside of the protein-coding region. [J] . M C Hu, S B Sharp, N Davidson Molecular and Cellular Biology . 1986,第1期

机译：小鼠骨骼肌α-肌动蛋白基因的完整序列揭示了在蛋白质编码区之外的几个保守和反向重复序列。
4. Prediction of Protein-Coding Regions in DNA Sequences Using a Model-Based Approach [C] . Rajasekhar Kakumani, Vijay Devabhaktuni, M. Omair Ahmad International Symposium on Circuits and Systems . 2008

机译：使用基于模型的方法预测DNA序列中的蛋白质编码区
5. Transcriptome detection by multiple RNA tiling array analysis and identifying functional conserved non-coding elements by statistical testing. [D] . Xu, Na. 2008

机译：通过多个RNA切片阵列分析检测转录组，并通过统计测试鉴定功能保守的非编码元件。
6. Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements [O] . Teresa M Creanza, David S Horner, Annarita DAddabbo, 2009

机译：统计评估蛋白质编码和非编码跨物种保守序列元素的判别特征
7. Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements [O] . Mignone Flavio, Maglietta Rosalia, D'Addabbo Annarita, 2009

机译：统计评估蛋白质编码和非编码跨物种保守序列元素的判别特征

Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements

摘要

著录项

相似文献

相关主题

期刊订阅