首页> 外文期刊>Journal of Computational Biology >Applying Fuzzy Technologies to Equivalence Learning in Protein Classification
【24h】

Applying Fuzzy Technologies to Equivalence Learning in Protein Classification

机译:将模糊技术应用于蛋白质分类中的等价学习

获取原文
获取原文并翻译 | 示例

摘要

When sequencing a new genome, its function and structure are important concerns, and inferring methods are based on protein sequence similarity methods. However, sequence groups differ in their parameters such as the number of group members and intra- and inter-class variability. A method that performs well on one group may not perform well on another group. Thus, learning similarity in a supervised manner could provide a general framework to set a similarity function to a specific sequence class. Here we describe a novel method that learns a similarity function between proteins by using a binary classifier and pairs of equivalent sequences (belonging to the same class) as positive samples, and non- equivalent sequences (belonging to different classes) as negative training samples. For sequence pair representation, we propose to use advanced techniques from fuzzy theory, including a sigmoid-type function for normalization and the class of Dombi operators that provide a more robust method. Using some additional constraints, the learned function turns out to be a valid kernel or metric function, and we present a new way of learning it, along with a new parameter-weighting technique. Using a dataset of archeal, bacterial, and eukaryotic 3-phosphoglycerate-kinase sequences (3PGK) and clusters from COG, we evaluate this equivalence learning method from a protein classification point of view. A receiver operator characteristic (ROC) analysis shows that we get a much more robust and accurate methodology for protein classification when these techniques are applied together. (See online Supplementary Material at www.liebertonline.com).
机译:在对新基因组进行测序时,其功能和结构是重要的考虑因素,并且推断方法基于蛋白质序列相似性方法。但是,序列组的参数不同,例如组成员的数量以及类内和类间变异性。在一组上执行良好的方法可能在另一组上执行不佳。因此,以监督方式学习相似性可以提供一个通用框架,以将相似性函数设置为特定序列类。在这里,我们描述了一种新颖的方法,该方法通过使用二元分类器和成对的等效序列(属于同一类别)对(作为正样本)和非等效序列(属于不同类别)对作为负训练样本来学习蛋白质之间的相似性功能。对于序列对表示,我们建议使用模糊理论中的高级技术,包括用于规范化的S型函数和提供更鲁棒方法的Dombi算子类。使用一些额外的约束,学习到的函数被证明是有效的内核或度量函数,我们提供了一种学习它的新方法以及新的参数加权技术。使用COG的古细菌,细菌和真核生物3-磷酸甘油酸激酶序列(3PGK)和簇的数据集,我们从蛋白质分类的角度评估了这种等效学习方法。接收者操作员特征(ROC)分析表明,将这些技术一起使用时,我们将获得更加可靠,准确的蛋白质分类方法。 (请参阅在线补充材料,网址为www.liebertonline.com)。

著录项

  • 来源
    《Journal of Computational Biology》 |2009年第4期|611-623|共13页
  • 作者单位

    Department of Informatics, University of Szeged, Szeged, Hungary.;

    Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary.;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号