宏基因组中可移动序列的精确检测问题研究

彭超; 王普; 葛瑞泉; 周丰丰

首页> 中文期刊> 《集成技术》 >宏基因组中可移动序列的精确检测问题研究

宏基因组中可移动序列的精确检测问题研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

基因组组装是宏基因组分析的主要挑战之一。通常假设所有测序序列均来源于同一个基因组，微生物中非常活跃的可移动元件给这个前提假设提出了重大质疑。文章将该质疑抽象为可移动元件与宿主染色体之间的二分类问题，准确的二分类性能将进一步促进宏基因组学方面的研究。基于宏基因组测序数据的数值化特征，详细考察特征选择算法 ReliefF、卡方检验和 Fisher判别t检验，并结合分类模型逻辑回归、极限学习机、支持向量机和随机森林，验证最优可移动元件检测模型的性能。实验结果表明，ReliefF特征选择算法和随机森林分类算法的融合模型，使用100个特征即可正确分类95%以上的宏基因组测序数据，优于使用全部的690个特征。%Genome assembling is one of the challenges in metagenomic analysis. It is usually assumed that the sequencing reads are from the same genome. However, the mobile elements active in microbial genomes raise a critical question mark on this assumption. This work formulated this issue as a binary classiifcation problem. The accurate discrimination of mobile elements from chromosomes could greatly facilitate the metagenomic analysis. After quantifying the sequencing reads in metagenome, the collaboration of binary classiifcation algorithms with feature selection algorithms, including ReliefF, chi-squared test, and Fisher’st-test was investigated. All feature subsets were tested using the classiifcation algorithms such as logisitic regression, extreme learning machine, support vector machine and random forest. Experimental results demonstrate that the model based on ReliefF algorithm and Random Forest algorithm achieves over 95% in accuracy with only 100 features, which outperforms the model utilizing all 690 features.

著录项

来源
《集成技术》 |2016年第2期|85-96|共12页
作者
彭超; 王普; 葛瑞泉; 周丰丰;
展开▼
作者单位

中国科学院深圳先进技术研究院深圳 518055;

中国科学院大学深圳先进技术学院深圳 518055;

中国科学院深圳先进技术研究院深圳 518055;

中国科学院大学深圳先进技术学院深圳 518055;

中国科学院深圳先进技术研究院深圳 518055;

中国科学院大学深圳先进技术学院深圳 518055;

中国科学院深圳先进技术研究院深圳 518055;

展开▼
原文格式 PDF
正文语种 chi
中图分类热处理工艺;
关键词
基因分类; 数据挖掘; 特征选择; 基因组条形码;

相似文献

中文文献
外文文献
专利

1. 土壤宏基因组中硫氧还蛋白还原酶部分序列的克隆 [J] . 李亚楠 ,刘晶晶 ,胡美英 . 江西农业大学学报 . 2012,第003期
2. 由 ALNQD 序列生成移动平均过程精确渐近性的一般结果 [J] . 关丽红 ,赵亚男 . 吉林大学学报（理学版） . 2015,第004期
3. 正相协序列生成的平均移动过程的Davis大数定律的精确渐近性 [J] . 王敏会 . 北华大学学报（自然科学版） . 2008,第002期
4. 正相协序列生成的平均移动过程的Baum-Katz大数定律的精确渐近性 [J] . 谭希丽 ,王敏会 ,付瑶 . 北华大学学报（自然科学版） . 2008,第004期
5. 由ALNQD序列生成的移动平均过程的大数定律及重对数律的精确渐近性 [J] . 张勇 ,赵世舜 ,董志山 . 数学研究通讯 . 2007,第006期
6. 红外精确测温在检测开关设备污秽中的应用 [C] . JIANG hong-liang ,姜洪亮 . 2015年华东六省一市输配电技术研讨会 . 2015
7. 时间序列互作网络及在宏基因组中的应用 [A] . 白晓刚 . 2019

宏基因组中可移动序列的精确检测问题研究

摘要

著录项

相似文献

相关主题

期刊订阅