Human pol II promoter prediction: time series descriptors and machine learning

Gangal R; Sharma P

首页> 外文期刊>Nucleic Acids Research >Human pol II promoter prediction: time series descriptors and machine learning

【24h】

Human pol II promoter prediction: time series descriptors and machine learning

机译：人类pol II启动子预测：时间序列描述符和机器学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although several in silico promoter prediction methods have been developed to date, they are still limited in predictive performance. The limitations are due to the challenge of selecting appropriate features of promoters that distinguish them from non-promoters and the generalization or predictive ability of the machine-learning algorithms. In this paper we attempt to define a novel approach by using unique descriptors and machine-learning methods for the recognition of eukaryotic polymerase II promoters. In this study, non-linear time series descriptors along with non-linear machine-learning algorithms, such as support vector machine (SVM), are used to discriminate between promoter and non-promoter regions. The basic idea here is to use descriptors that do not depend on the primary DNA sequence and provide a clear distinction between promoter and non-promoter regions. The classification model built on a set of 1000 promoter and 1500 non-promoter sequences, showed a 10-fold cross-validation accuracy of 87% and an independent test set had an accuracy > 85% in both promoter and non-promoter identification. This approach correctly identified all 20 experimentally verified promoters of human chromosome 22. The high sensitivity and selectivity indicates that n-mer frequencies along with non-linear time series descriptors, such as Lyapunov component stability and Tsallis entropy, and supervised machine-learning methods, such as SVMs, can be useful in the identification of pol II promoters.

机译：尽管迄今为止已经开发了几种计算机启动子预测方法，但是它们的预测性能仍然受到限制。局限性是由于选择合适的启动子特征（将启动子与非启动子区别开来）以及机器学习算法的一般性或可预测性带来的挑战。在本文中，我们尝试通过使用独特的描述符和机器学习方法来定义一种识别真核聚合酶II启动子的新颖方法。在这项研究中，非线性时间序列描述符以及非线性机器学习算法（例如支持向量机（SVM））用于区分启动子区域和非启动子区域。这里的基本思想是使用不依赖于一级DNA序列的描述符，并在启动子和非启动子区域之间提供清晰的区分。建立在一组1000个启动子和1500个非启动子序列上的分类模型，显示出10倍的交叉验证准确度为87％，独立测试集在启动子和非启动子识别中的准确度均大于85％。这种方法正确地鉴定了人类染色体22的所有20个经过实验验证的启动子。高灵敏度和选择性表明，n-mer频率以及非线性时间序列描述符（例如Lyapunov分量稳定性和Tsallis熵）以及监督的机器学习方法，例如SVM，可用于鉴定pol II启动子。

著录项

来源
《Nucleic Acids Research》 |2005年第4期|共5页
作者
Gangal R; Sharma P;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类细胞形态学;
关键词
Cuticle; cracking; epidermis; fruit growth; Lycopersicon esculentum; plant biomechanics; ripening; stiffening; tomato;

机译：角质层;开裂;表皮;果实生长;番茄;植物生物力学;成熟;僵化;番茄;

相似文献

外文文献
中文文献
专利

1. Human pol II promoter prediction: time series descriptors and machine learning [J] . Gangal R, Sharma P Nucleic Acids Research . 2005,第4期

机译：人类pol II启动子预测：时间序列描述符和机器学习
2. Human pol II promoter prediction: time series descriptors and machine learning [J] . Pankaj Sharma, Rajeev Gangal Nucleic acids research . 2005,第5期

机译：人类pol II启动子预测：时间序列描述符和机器学习
3. Human pol II promoter prediction: time series descriptors and machine learning [J] . Pankaj Sharma, Rajeev Gangal Nucleic acids research . 2005,第4期

机译：人类pol II启动子预测：时间序列描述符和机器学习
4. Differencing Time Series as an Important Feature Extraction for Intradialytic Hypotension Prediction using Machine Learning [C] . Jiun-Yi Yang, Hsiang Wei Hu, Chih-Hao Liu, IEEE Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability . 2021

机译：差分时间序列作为使用机器学习的内型低血压预测的重要特征提取
5. Advances in Machine Learning: Theory and Applications in Time Series Prediction [D] . London, Justin J. 2021

机译：机器学习进步：时间序列预测的理论和应用
6. Human pol II promoter prediction: time series descriptors and machine learning [O] . Rajeev Gangal, Pankaj Sharma 2005

机译：人类pol II启动子预测：时间序列描述符和机器学习
7. Human pol II promoter prediction: time series descriptors and machine learning [O] . Gangal Rajeev, Sharma Pankaj 2005

机译：人类pol II启动子预测：时间序列描述符和机器学习

Human pol II promoter prediction: time series descriptors and machine learning

摘要

著录项

相似文献

相关主题

期刊订阅