...
首页> 外文期刊>Bioinformatics >AllergenFP: allergenicity prediction by descriptor fingerprints
【24h】

AllergenFP: allergenicity prediction by descriptor fingerprints

机译:AllergenFP:通过描述符指纹预测致敏性

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Allergenicity, like antigenicity and immunogenicity, is a property encoded linearly and non-linearly, and therefore the alignment-based approaches are not able to identify this property unambiguously. A novel alignment-free descriptor-based fingerprint approach is presented here and applied to identify allergens and non-allergens. The approach was implemented into a four step algorithm. Initially, the protein sequences are described by amino acid principal properties as hydrophobicity, size, relative abundance, helix and beta-strand forming propensities. Then, the generated strings of different length are converted into vectors with equal length by autoand cross-covariance (ACC). The vectors were transformed into binary fingerprints and compared in terms of Tanimoto coefficient. Results: The approach was applied to a set of 2427 known allergens and 2427 non-allergens and identified correctly 88% of them with Matthews correlation coefficient of 0.759. The descriptor fingerprint approach presented here is universal. It could be applied for any classification problem in computational biology. The set of E-descriptors is able to capture the main structural and physicochemical properties of amino acids building the proteins. The ACC transformation overcomes the main problem in the alignment-based comparative studies arising from the different length of the aligned protein sequences. The conversion of protein ACC values into binary descriptor fingerprints allows similarity search and classification.
机译:动机:像抗原性和免疫原性一样,变应原性是线性和非线性编码的特性,因此基于比对的方法无法明确识别该特性。本文介绍了一种新颖的无比对基于描述符的指纹方法,并将其应用于识别过敏原和非过敏原。该方法已实现为四步算法。最初,蛋白质序列由氨基酸的主要特性描述为疏水性,大小,相对丰度,螺旋和β链形成倾向。然后,通过自动和互协方差(ACC)将生成的不同长度的字符串转换为长度相等的向量。将载体转化为二元指纹并根据谷本系数进行比较。结果:该方法应用于一组2427种已知过敏原和2427种非过敏原,并正确识别了88%的过敏原,其Matthews相关系数为0.759。这里介绍的描述符指纹方法是通用的。它可以应用于计算生物学中的任何分类问题。电子描述符集能够捕获构成蛋白质的氨基酸的主要结构和物理化学性质。 ACC转化克服了基于比对的比较研究中由于比对的蛋白质序列长度不同而引起的主要问题。蛋白质ACC值到二进制描述符指纹的转换允许相似性搜索和分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号