...
首页> 外文期刊>BMC Bioinformatics >A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition
【24h】

A high performance prediction of HPV genotypes by Chaos game representation and singular value decomposition

机译:通过混沌博弈表示法和奇异值分解对HPV基因型进行高性能预测

获取原文

摘要

Background Human Papillomavirus (HPV) genotyping is an important approach to fight cervical cancer due to the relevant information regarding risk stratification for diagnosis and the better understanding of the relationship of HPV with carcinogenesis. This paper proposed two new feature extraction techniques, i.e. ChaosCentroid and ChaosFrequency, for predicting HPV genotypes associated with the cancer. The additional diversified 12 HPV genotypes, i.e. types 6, 11, 16, 18, 31, 33, 35, 45, 52, 53, 58, and 66, were studied in this paper. In our proposed techniques, a partitioned Chaos Game Representation (CGR) is deployed to represent HPV genomes. ChaosCentroid captures the structure of sequences in terms of centroid of each sub-region with Euclidean distances among the centroids and the center of CGR as the relations of all sub-regions. ChaosFrequency extracts the statistical distribution of mono-, di-, or higher order nucleotides along HPV genomes and forms a matrix of frequency of dots in each sub-region. For performance evaluation, four different types of classifiers, i.e. Multi-layer Perceptron, Radial Basis Function, K-Nearest Neighbor, and Fuzzy K-Nearest Neighbor Techniques were deployed, and our best results from each classifier were compared with the NCBI genotyping tool. Results The experimental results obtained by four different classifiers are in the same trend. ChaosCentroid gave considerably higher performance than ChaosFrequency when the input length is one but it was moderately lower than ChaosFrequency when the input length is two. Both proposed techniques yielded almost or exactly the best performance when the input length is more than three. But there is no significance between our proposed techniques and the comparative alignment method. Conclusions Our proposed alignment-free and scale-independent method can successfully transform HPV genomes with 7,000 - 10,000 base pairs into features of 1 - 11 dimensions. This signifies that our ChaosCentroid and ChaosFrequency can be served as the effective feature extraction techniques for predicting the HPV genotypes.
机译:背景技术人乳头瘤病毒(HPV)基因分型是抗击宫颈癌的重要方法,因为有关诊断风险分层的相关信息以及对HPV与癌变之间关系的更好理解。本文提出了两种新的特征提取技术,即ChaosCentroid和ChaosFrequency,用于预测与癌症相关的HPV基因型。本文研究了另外12种HPV基因型,即6、11、16、18、31、33、35、45、52、53、58和66型。在我们提出的技术中,采用了分区的混沌游戏表示法(CGR)来表示HPV基因组。 ChaosCentroid使用每个子区域的质心来捕获序列结构,质心之间的欧式距离和CGR的中心是所有子区域的关系。 ChaosFrequency提取沿HPV基因组的单,双或更高阶核苷酸的统计分布,并形成每个子区域中点的频率矩阵。为了进行性能评估,部署了四种不同类型的分类器,即多层感知器,径向基函数,K最近邻和模糊K最近邻技术,并将每个分类器的最佳结果与NCBI基因分型工具进行了比较。结果通过四个不同分类器获得的实验结果处于相同趋势。当输入长度为1时,ChaosCentroid的性能要比ChaosFerequency高得多,但当输入长度为2时,ChaosFentequency的性能要比ChaosFerequency适度地低。当输入长度大于三时,两种提议的技术都几乎或完全产生了最佳性能。但是我们提出的技术和比较比对方法之间没有意义。结论我们提出的无比对和不依赖尺度的方法可以成功地将7,000-10,000个碱基对的HPV基因组转化为1-11个维度的特征。这表明我们的ChaosCentroid和ChaosFrequency可以用作预测HPV基因型的有效特征提取技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号