...
首页> 外文期刊>International journal of medical informatics >Gene functional annotation by statistical analysis of biomedical articles
【24h】

Gene functional annotation by statistical analysis of biomedical articles

机译:通过生物医学文章的统计分析进行基因功能注释

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Background: Functional annotation of genes is an important task in biology since it facilitates the characterization of genes relationships and the understanding of biochemical pathways. The various gene functions can be described by standardized and structured vocabularies, called bio-ontologies. The assignment of bio-ontolgy terms to genes is carried out by means of applying certain methods to datasets extracted from biomedical articles. These methods originate from data mining and machine learning and include maximum entropy or support vector machines (SVM).rnPurpose: The aim of this paper is to propose an alternative to the existing methods for functionally annotating genes. The methodology involves building of classification models, validation and graphical representations of the results and reduction of the dimensions of the dataset.rnMethods: Classification models are constructed by Linear discriminant analysis (LDA). The validation of the models is based on statistical analysis and interpretation of the results involving techniques like hold-out samples, test datasets and metrics like confusion matrix, accuracy, recall, precision and F-measure. Graphical representations, such as boxplots, Andrew's curves and scatterplots of the variables resulting from the classification models are also used for validating and interpreting the results.rnResults: The proposed methodology was applied to a dataset extracted from biomedical articles for 12 Gene Ontology terms. The validation of the LDA models and the comparison with the SVM show that LDA (mean F-measure 75.4%) outperforms the SVM (mean F-measure 68.7%) for the specific data.rnConclusion: The application of certain statistical methods can be beneficial for functional gene annotation from biomedical articles. Apart from the good performance the results can be interpreted and give insight of the bio-text data structure.
机译:背景:基因的功能注释是生物学中的重要任务,因为它有助于基因关系的表征和对生化途径的理解。各种基因功能可以通过称为生物本体的标准化词汇和结构化词汇来描述。通过将某些方法应用于从生物医学文章中提取的数据集,可以将生物本体术语分配给基因。这些方法起源于数据挖掘和机器学习,包括最大熵或支持向量机(SVM)。目的:本文的目的是提出一种替代现有功能基因注释方法的方法。该方法学涉及分类模型的建立,结果的验证和图形表示以及数据集维的缩减。方法:通过线性判别分析(LDA)构建分类模型。模型的验证基于统计分析和对结果的解释,这些结果涉及诸如保留样本,测试数据集之类的技术以及诸如混淆矩阵,准确性,召回率,精确度和F度量之类的度量。图形表示法(例如箱线图,安德鲁曲线和分类模型得出的变量的散点图)也用于验证和解释结果。结果:将拟议的方法应用于从生物医学文章中提取的12个基因本体论术语的数据集。 LDA模型的验证以及与SVM的比较表明,对于特定数据,LDA(平均F值75.4%)优于SVM(平均F值68.7%)。rn结论:某些统计方法的应用可能是有益的用于生物医学文章中的功能基因注释。除了良好的性能外,还可以解释结果并提供对生物文本数据结构的洞察力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号