首页> 外文OA文献 >A study on component-based technology for development of complex bioinformatics softwareud
【2h】

A study on component-based technology for development of complex bioinformatics softwareud

机译:基于组件的复杂生物信息学软件开发技术研究 ud

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the first chapter, entitled “Enhancement of Support Vector Machines for Remote Protein Homology Detection and Fold Recognition,” M. Hilmi Muda, Puteh Saad and Razib M. Othman present a comprehensive method based on two-layer multiclass classifiers. The first layer is used to detect up to superfamily and family in SCOP hierarchy, by using optimized binary SVM classification rules directly to ROC-Area. The second layer uses discriminative SVM algorithm with a state-of-the-art string kernel based on PSI-BLAST profiles that is used to leverage the unlabeled data. It will detect up to fold in SCOP hierarchy. They evaluated the results obtained using mean ROC and mean MRFP. Experimental results show that their approaches significantly improve the performance of protein remote protein homology detection for all three different datasets (SCOP 1.53, 1.67 and 1.73). They achieved 0.03% improvement in term of mean ROC in dataset SCOP 1.53, 1.17% in dataset SCOP 1.67 and 0.33% in dataset SCOP 1.73 when compared to the results produced by state-of-the-art methods. In the second chapter “Hybrid Clustering Support Vector Machines by Incorporating Protein Residue Information for Protein Local Structure Prediction,” Rohayanti Hassan, Puteh Saad, and Razib M. Othman develop a predictive algorithm named R-HCSVM to predict protein local structure that works with following steps. Firstly, pre-process the input information for RHCSVM. There are two types of input information needed namely protein residue score and protein secondary structure class. ResiduePatchScore information has been introduced as new method to pre-process protein residue score by combining protein conservation score that conserved rich functional information and protein propensity score that conserved rich secondary structural information. Hence, the protein residue score possess strength information that able to avoid bias scoring. Secondly, segment protein sequences into nine continuous length of protein subsequence. Next step which is highlighted another novel part in their study whereas a hybrid clustering SVM is introduced to reduce the training complexity. SOM and K-Means are integrated as a clustering algorithm to produce a granular input, while SVM is then used as a classifier. Based on the protein sequence datasets obtained from PISCES database, they found iii that the R-HCSVM performs outstanding result in predicting protein local structure from a given protein subsequence compared to other methods. In the third chapter “Incorporating Gene Ontology with Conditional-based Clustering to Analyze Gene Expression Data,” Shahreen Kasim, Safaai Deris, and Razib M. Othman proposed a clustering algorithm named BTreeBicluster. The BTreeBicluster starts with the development of GO tree and enriching it with expression similarity from the Sacchromyces genes. From the enriched GO tree, the BTreeBicluster algorithm is applied during the clustering process. The BTreeBicluster takes subset of conditions of gene expression dataset using discretized data. Therefore, the annotation in the GO tree is already determined before the clustering process starts which gives major reflect to the output clusters. Their results of this study have shown that the BTreeBicluster produces better consistency of the annotation. In the final chapter “Improving Protein-Protein Interaction Prediction by a False Positive Filtration Process,” Rosfuzah Roslan and Razib M. Othman aimed to enhance the overlap between computational predictions and experimental results with the effort to partially remove the false positive pairs from the computational predicted PPI datasets. The usage of protein function prediction based on shared interacting domain patterns named PFP() for the purpose of aiding the Gene Ontology Annotation (GOA) is introduced in their study. They used GOA and PFP() as agents in the filtration process to reduce the false positive in computationally predicted PPI pairs. The functions predicted by PFP() which are in Gene Ontology (GO) IDs that were extracted from cross-species PPI data were used to assign novel functional annotations for the uncharacterized proteins and also as additional functions for those that are already characterized by GO. As known by them, GOA is an ongoing process and protein normally executes a variety of functions in different processes, so with the implementation of PFP(), they have increased the chances of finding matching function annotation for the first rule in the filtration process as much as 20%. Their results after the filtration process showed that huge sums of false positive pairs were removed from the predicted datasets. They used signal-to-noise ratio as a measure of improvement made by applying the proposed filtration process. While strength values were used to evaluate the applicability of the whole proposed computational framework to all the different computational PPI prediction methods.
机译:在第一章的标题为“用于远程蛋白质同源性检测和折叠识别的支持向量机的增强”中,M。Hilmi Muda,Puteh Saad和Razib M. Othman提出了一种基于两层多层分类器的综合方法。通过直接对ROC-Area使用优化的二进制SVM分类规则,第一层用于检测SCOP层次结构中的超家族和家族。第二层使用具有判别性的SVM算法,该算法具有基于PSI-BLAST配置文件的最新字符串内核,该内核用于利用未标记的数据。它将最多检测到SCOP层次结构中的折叠。他们评估了使用平均ROC和平均MRFP获得的结果。实验结果表明,对于所有三个不同的数据集(SCOP 1.53、1.67和1.73),他们的方法显着提高了蛋白质远程蛋白质同源性检测的性能。与通过最新方法得出的结果相比,他们在数据集SCOP 1.53中的平均ROC方面提高了0.03%,在数据集SCOP 1.67中提高了1.17%,在数据集SCOP 1.73中实现了0.33%。在第二章“通过结合蛋白质残基信息进行蛋白质局部结构预测的混合聚类支持向量机”中,Rohayanti Hassan,Puteh Saad和Razib M. Othman开发了一种名为R-HCSVM的预测算法来预测蛋白质局部结构,该算法可用于脚步。首先,对RHCSVM的输入信息进行预处理。需要两种输入信息,即蛋白质残基评分和蛋白质二级结构类别。 ResiduePatchScore信息已被引入为一种新方法,通过结合保留丰富功能信息的蛋白质保守评分和保留丰富二级结构信息的蛋白质倾向评分来预处理蛋白质残留分数。因此,蛋白质残基评分具有能够避免偏倚评分的强度信息。其次,将蛋白质序列分割成九个连续长度的蛋白质子序列。下一步强调了他们研究中的另一个新颖部分,而引入了混合聚类支持向量机以减少训练的复杂性。 SOM和K-Means被集成为聚类算法,以生成细粒度输入,而SVM随后被用作分类器。基于从PISCES数据库获得的蛋白质序列数据集,他们发现iii与其他方法相比,R-HCSVM在预测给定蛋白质亚序列的蛋白质局部结构方面表现出色。在第三章“将基因本体与基于条件的聚类结合以分析基因表达数据”中,Shahreen Kasim,Safaai Deris和Razib M. Othman提出了一种名为BTreeBicluster的聚类算法。 BTreeBicluster从GO树的发展开始,并通过酵母菌基因的表达相似性丰富了它。从丰富的GO树中,在聚类过程中应用了BTreeBicluster算法。 BTreeBicluster使用离散化数据获取基因表达数据集条件的子集。因此,在聚类过程开始之前就已经确定了GO树中的注释,这对输出聚类有很大的影响。他们的研究结果表明,BTreeBicluster产生了更好的注释一致性。在最后一章“通过错误的正过滤过程改进蛋白质-蛋白质相互作用预测”中,Rosfuzah Roslan和Razib M. Othman的目的是增强计算预测与实验结果之间的重叠,以努力从计算中部分除去错误的阳性对。预测的PPI数据集。在他们的研究中介绍了基于名为PFP()的共享相互作用域模式的蛋白质功能预测的用途,以辅助基因本体注释(GOA)。他们在过滤过程中使用GOA和PFP()作为代理,以减少计算预测的PPI对中的误报。从跨物种PPI数据中提取的基因本体(GO)ID中由PFP()预测的功能用于为未表征的蛋白质分配新的功能注释,还可以用作已由GO表征的蛋白质的附加功能。正如他们所知,GOA是一个持续的过程,蛋白质通常在不同的过程中执行各种功能,因此,通过PFP()的实现,他们增加了在过滤过程中为第一个规则找到匹配功能注释的机会,因为高达20%他们在过滤过程之后的结果表明,从预测数据集中删除了大量假阳性对。他们使用信噪比作为通过应用建议的过滤过程进行改进的度量。虽然强度值用于评估整个提议的计算框架对所有不同的计算PPI预测方法的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号