首页> 外文会议>2012 IEEE 8th International Conference on E-Science. >Data-intensive analysis of HIV mutations
【24h】

Data-intensive analysis of HIV mutations

机译:艾滋病毒突变的数据密集型分析

获取原文
获取原文并翻译 | 示例

摘要

Mutations in HIV patients' reverse transcriptase and protease may be related to drug resistance. There are many issues that make difficult the complete elucidation of the relationship between these mutations and drug resistance, such as cross resistance and the limitations to detect the relevance of resistance. Look up tables and rule-based systems are an attempt to classify sequences and predict treatment failure. However, they depend on the scientific literature and their quality and reliability. Data-intensive analysis of HIV mutation databases may help to corroborate or to improve such knowledge spread in the literature. Pattern recognition algorithms classify data extracting information from different data domain. Clustering and biclustering classification algorithms have been explored to group scientific and business data based on measures of similarities. K-means is a popular algorithm for clustering and Bimax is used with binary data. Considering this scenario, the main contribution of this work is to develop a new methodology based on K-means and Bimax using a binary data representation of reverse transcriptase and protease sequences, in an attempt to get an unsupervised classification of the sequences that may be related to drug resistance. In our work, 14,393 sequences with selected positions of the proteins, known to be related to drug resistance, represented in an 82-dimensional vector space are analyzed by pattern recognition algorithms. The sequences are represented as binary vectors. Suitable visualization of such vectors is produced for medical interpretation and indicates some correspondence to the prediction of drug resistance given by the brazilian look up table, used by brazilian physicians, but that depends on the literature on HIV and it's quality to be created. As a consequence, in this work we describe a methodology based on the application of pattern recognition algorithms using binary data in order to suggest clusters of mutations and t- eir relations with drug resistance using a different cluster visualization scheme.
机译:HIV患者逆转录酶和蛋白酶的突变可能与耐药性有关。存在许多使这些突变与耐药性之间的关系难以完全阐明的问题,例如交叉耐药性和检测耐药性相关性的局限性。查找表和基于规则的系统是对序列进行分类并预测治疗失败的尝试。但是,它们取决于科学文献及其质量和可靠性。对HIV突变数据库的数据密集型分析可能有助于证实或改善文献中传播的此类知识。模式识别算法对来自不同数据域的数据提取信息进行分类。已经探索了基于相似性度量的聚类和双聚类分类算法来对科学和商业数据进行分组。 K-means是一种流行的聚类算法,Bimax用于二进制数据。考虑到这种情况,这项工作的主要贡献是使用逆转录酶和蛋白酶序列的二进制数据表示法,开发了一种基于K-means和Bimax的新方法,以试图获得可能相关序列的无监督分类。耐药性。在我们的工作中,通过模式识别算法分析了在82维向量空间中表示的,与蛋白质抗性相关的,具有选定蛋白质位置的14,393个序列。序列表示为二进制向量。可以对此类载体进行适当的可视化,以进行医学解释,并表明与巴西医师使用的巴西查询表给出的抗药性预测具有一定的对应性,但这取决于有关HIV的文献及其质量。因此,在这项工作中,我们描述了一种基于使用二进制数据的模式识别算法的应用的方法,以便使用不同的群集可视化方案建议突变的群集以及与耐药性的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号