首页> 外文会议>7th Asian-Pacific Conference on Medical and Biological Engineering(第七届亚太地区生物工程学术会议)论文集 >A Novel Method of Prokaryotic Promoter Regions Prediction with Feature Selection: Quadratic Discriminant Analysis Approach
【24h】

A Novel Method of Prokaryotic Promoter Regions Prediction with Feature Selection: Quadratic Discriminant Analysis Approach

机译:特征选择的原核启动子区域预测新方法:二次判别分析法

获取原文

摘要

Promoter identification is an essential task in the research of transcription regulation,but the prediction accuracy of current methods is still far away from what it is expected.An effective and reliable prediction method for prokaryotic promoter regions would be very helpful.We have developed a quadratic discriminant analysis (QDA) method based on feature selection to predict prokaryotic promoter regions,which are classified according to their locations in genome.In order to utilize more characteristic information,we incorporate content features,signal features and structure features of the promoters in the candidate feature set and construct proper statistical models to calculate them.Especially for the main conserved signal features,a composite motif model is adopted,which achieves the optimal parameters by an iterative search algorithm OPSIA.Using the squared Mahalonobis distance as a measure,the discriminating features are selected out from the candidate features through a stepwise procedure and are combined as a multidimensional vector.Then the vector of combined features is further used by QDA to predict the potential promoter regions.The algorithm has been trained and tested on E.coli and B.subtilis promoter datasets by the jackknife method.For E.coli σ70 promoters located in the non-coding regions,the average prediction accuracy is 85.7%,and for the ones located in the coding regions and several other kinds of prokaryotic promoters,their prediction accuracies are also about 80%.The results indicate that our method is a universal algorithm that outperforms most of the existing approaches based on several performance measurements.Furthermore,the framework of the method is extendable,which can accept more new features to improve the prediction results efficiently.The OPSIA algorithm is also a useful tool to explore composite motifs in newly uncovered promoter sequences.
机译:启动子识别是转录调控研究中必不可少的任务,但目前方法的预测准确性仍远未达到预期。一种有效,可靠的原核启动子区域预测方法将非常有帮助。基于特征选择的判别分析(QDA)方法来预测原核启动子区域,并根据其在基因组中的位置进行分类。为了利用更多特征信息,我们在候选物中结合了启动子的内容特征,信号特征和结构特征特征集并构建适当的统计模型进行计算。特别是对于主要的保守信号特征,采用复合图案模型,通过迭代搜索算法OPSIA获得最佳参数。通过stepwi从候选特征中选择这些步骤将其组合成一个多维向量。然后由QDA进一步使用组合特征的向量来预测潜在的启动子区域。该算法已通过刀切法在大肠杆菌和枯草芽孢杆菌启动子数据集上进行了训练和测试。对于位于非编码区的大肠杆菌σ70启动子,平均预测准确度为85.7%,对于位于编码区的大肠杆菌和其他几种原核启动子,其预测准确度也约为80%。表示我们的方法是一种通用算法,它基于几种性能测量结果优于大多数现有方法。此外,该方法的框架是可扩展的,可以接受更多新功能以有效地改善预测结果.OPSIA算法也是一种探索新发现的启动子序列中复合基序的有用工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号