Shallow learning model for diagnosing neuro muscular disorder from splicing variants

Kalimuthu Sathyavikasini; Vijayakumar Vijaya

首页> 外文期刊>Military operations research >Shallow learning model for diagnosing neuro muscular disorder from splicing variants

【24h】

Shallow learning model for diagnosing neuro muscular disorder from splicing variants

机译：从剪接变体诊断神经肌肉疾病的浅层学习模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Purpose - Diagnosing genetic neuromuscular disorder such as muscular dystrophy is complicated when the imperfection occurs while splicing. This paper aims in predicting the type of muscular dystrophy from the gene sequences by extracting the well-defined descriptors related to splicing mutations. An automatic model is built to classify the disease through pattern recognition techniques coded in python using scikit-learn framework. Design/methodology/approach - In this paper, the cloned gene sequences are synthesized based on the mutation position and its location on the chromosome by using the positional cloning approach. For instance, in the human gene mutational database (HGMD), the mutational information for splicing mutation is specified as IVS1-5 T > G indicates (IVS - intervening sequence or introns), first intron and five nucleotides before the consensus intron site AG, where the variant occurs in nucleotide G altered to T. IVS (+ve) denotes forward strand 3′-positive numbers from G of donor site invariant and IVS (-ve) denotes backward strand 5′-negative numbers starting from G of acceptor site. The key idea in this paper is to spot out discriminative descriptors from diseased gene sequences based on splicing variants and to provide an effective machine learning solution for predicting the type of muscular dystrophy disease with the splicing mutations. Multi-class classification is worked out through data modeling of gene sequences. The synthetic mutational gene sequences are created, as the diseased gene sequences are not readily obtainable for this intricate disease. Positional cloning approach supports in generating disease gene sequences based on mutational information acquired from HGMD. SNP-, gene- and exon-based discriminative features are identified and used to train the model. An eminent muscular dystrophy disease prediction model is built using supervised learning techniques in scikit-learn environment. The data frame is built with the extracted features as numpy array. The data are normalized by transforming the feature values into the range between 0 and 1 aid in scaling the input attributes for a model. Naïve Bayes, decision tree, K-nearest neighbor and SVM learned models are developed using python library framework in scikit-learn. Findings - To the best knowledge of authors, this is the foremost pattern recognition model, to classify muscular dystrophy disease pertaining to splicing mutations. Certain essential SNP-, gene- and exon-based descriptors related to splicing mutations are proposed and extracted from the cloned gene sequences. An eminent model is built using statistical learning technique through scikit-learn in the anaconda framework. This paper also deliberates the results of statistical learning carried out with the same set of gene sequences with synonymous and non-synonymous mutational descriptors. Research limitations/implications - The data frame is built with the Numpy array. Normalizing the data by transforming the feature values into the range between 0 and 1 aid in scaling the input attributes for a model. Naïve Bayes, decision tree, K-nearest neighbor and SVM learned models are developed using python library framework in scikit-learn. While learning the SVM model, the cost, gamma and kernel parameters are tuned to attain good results. Scoring parameters of the classifiers are evaluated using tenfold cross-validation using metric functions of scikit-learn library. Results of the disease identification model based on non-synonymous, synonymous and splicing mutations were analyzed. Practical implications - Certain essential SNP-, gene- and exon-based descriptors related to splicing mutations are proposed and extracted from the cloned gene sequences. An eminent model is built using statistical learning technique through scikit-learn in the anaconda framework. The performance of the classifiers are increased by using different estimators from the scikit-learn library. Several types of mutations such as missense, non-sense and silent mutations are also considered to build models through statistical learning technique and their results are analyzed. Originality/value - To the best knowledge of authors, this is the foremost pattern recognition model, to classify muscular dystrophy disease pertaining to splicing mutations.

机译：目的-当剪接时出现缺陷时，诊断遗传性神经肌肉疾病（如肌肉营养不良）会很复杂。本文旨在通过提取与剪接突变相关的定义明确的描述符，从基因序列中预测肌肉营养不良的类型。通过使用scikit-learn框架以python编码的模式识别技术，构建了自动模型来对疾病进行分类。设计/方法/方法-在本文中，使用位置克隆方法根据突变位置及其在染色体上的位置合成了克隆的基因序列。例如，在人类基因突变数据库（HGMD）中，剪接突变的突变信息指定为IVS1-5 T> G表示（IVS-插入序列或内含子），第一个内含子和共有内含子位点AG前的五个核苷酸，其中变体出现在核苷酸G上改变为T。IVS（+ ve）表示来自供体位点G的前向链3'阳性数，IVS（-ve）表示从受体位点G开始的后向链5'负数。本文的主要思想是从基于剪接变体的患病基因序列中识别出具有区别性的描述子，并为预测具有剪接突变的肌肉营养不良疾病的类型提供有效的机器学习解决方案。通过对基因序列进行数据建模，可以进行多类分类。创建合成的突变基因序列，因为对于这种复杂疾病不容易获得患病的基因序列。位置克隆方法支持基于从HGMD获得的突变信息产生疾病基因序列。识别基于SNP，基因和外显子的判别特征并将其用于训练模型。在scikit学习环境中使用监督学习技术建立了一个突出的肌营养不良症疾病预测模型。数据帧以提取的特征作为numpy数组构建。通过将特征值转换为介于0和1之间的范围来对数据进行归一化，有助于缩放模型的输入属性。使用scikit-learn中的python库框架开发了朴素贝叶斯，决策树，K最近邻和SVM学习模型。发现-据作者所知，这是最主要的模式识别模型，用于对与剪接突变有关的肌营养不良症进行分类。提出了一些与剪接突变有关的必要的基于SNP，基因和外显子的描述符，并从克隆的基因序列中提取了这些描述符。在anaconda框架中通过scikit-learn使用统计学习技术构建了一个杰出的模型。本文还讨论了使用同义和非同义突变描述符的同一组基因序列进行统计学习的结果。研究局限性/含义-数据帧是用Numpy数组构建的。通过将特征值转换为介于0和1之间的范围来规范化数据，有助于缩放模型的输入属性。使用scikit-learn中的python库框架开发了朴素贝叶斯，决策树，K最近邻和SVM学习模型。在学习SVM模型时，调整成本，伽玛和内核参数以获得良好的结果。使用scikit-learn库的度量函数，使用十倍交叉验证对分类器的评分参数进行评估。分析了基于非同义，同义和剪接突变的疾病鉴定模型的结果。实际意义-提出了一些与剪接突变相关的，基于SNP，基因和外显子的基本描述符，并从克隆的基因序列中提取了这些描述符。在anaconda框架中通过scikit-learn使用统计学习技术构建了一个杰出的模型。通过使用scikit-learn库中的不同估算器，可以提高分类器的性能。还考虑了几种类型的突变，例如错义突变，无义突变和沉默突变，通过统计学习技术来建立模型，并分析其结果。原创性/价值-就作者所知，这是最主要的模式识别模型，用于对与剪接突变有关的肌营养不良症进行分类。

著录项

来源
《Military operations research》 |2017年第4期|329-336|共8页
作者
Kalimuthu Sathyavikasini; Vijayakumar Vijaya;
展开▼
作者单位

Department of Computer Science, PSGR Krishnammal College for Women, Coimbatore, India;

Department of Computer Science, PSGR Krishnammal College for Women, Coimbatore, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Descriptors; Disease identification; Machine learning; Muscular dystrophy; Scikit learn; Splicing;

机译：描述符;疾病鉴定;机器学习;肌营养不良症;Scikit学习;拼接;

相似文献

外文文献
中文文献
专利

1. Application of machine-learning models for diagnosing health hazard of nitrate toxicity in shallow aquifers [J] . Paddy and Water Environment . 2017,第1期

机译：机床学习模型在浅层含水层中核酸毒性健康危害的应用
2. Mouse Models of Mutations and Variations in Autism Spectrum Disorder-Associated Genes: Mice Expressing Caps2/Cadps2 Copy Number and Alternative Splicing Variants [J] . Tetsushi Sadakata, Yo Shinoda, Akira Sato, International Journal of Environmental Research and Public Health . 2013,第12期

机译：突变的小鼠模型和自闭症谱系紊乱相关基因的变化：表达CAPS2 / CADPS2拷贝数和替代剪接变体的小鼠
3. ABCA4-associated disease as a model for missing heritability in autosomal recessive disorders: novel noncoding splice, cis-regulatory, structural, and recurrent hypomorphic variants [J] . Bauwens Miriam, Garanto Alejandro, Sangermano Riccardo, Genetics in medicine . 2019,第8期

机译：ABCA4相关疾病作为常染色体隐性疾病中遗漏遗传性的模型：新型非成型剪接，顺式调节，结构和复发性低晶体变异
4. Structural genomics analysis of alternative splicing and its application in modeling structures of alternatively spliced variants [C] . Peng Wang, Bo Yan, Juntao Guo, . 2005

机译：替代剪接的结构基因组学分析及其在替代剪接变体结构建模中的应用
5. Investigating the pre-mRNA splicing of the Survival Motor Neuron genes to model the Spinal Muscular Atrophy disease phenotype. [D] . Gladman, Jordan Tanin. 2010

机译：研究生存运动神经元基因的mRNA前剪接，以建模脊髓性肌萎缩症疾病表型。
6. Mouse Models of Mutations and Variations in Autism Spectrum Disorder-Associated Genes: Mice Expressing Caps2/Cadps2 Copy Number and Alternative Splicing Variants [O] . Tetsushi Sadakata, Yo Shinoda, Akira Sato, 2013

机译：自闭症谱系障碍相关基因的突变和变异的小鼠模型：表达Caps2 / Cadps2拷贝数和替代剪接变体的小鼠。
7. Characterization of school-related problems and diagnoses in a Neuro-Learning Disorder Clinic [O] . Mariana Coelho CARVALHO, Ricardo Franco de LIMA, Gláucia Gabriela Bagattini de SOUZA, 2016

机译：神经学习疾病诊所学院相关问题的特征及诊断

Shallow learning model for diagnosing neuro muscular disorder from splicing variants

摘要

著录项

相似文献

相关主题

期刊订阅