首页> 外文会议>IEEE International Conference on E-Science >Data-intensive analysis of HIV mutations
【24h】

Data-intensive analysis of HIV mutations

机译:艾滋病毒突变的数据密集分析

获取原文

摘要

Mutations in HIV patients' reverse transcriptase and protease may be related to drug resistance. There are many issues that make difficult the complete elucidation of the relationship between these mutations and drug resistance, such as cross resistance and the limitations to detect the relevance of resistance. Look up tables and rule-based systems are an attempt to classify sequences and predict treatment failure. However, they depend on the scientific literature and their quality and reliability. Data-intensive analysis of HIV mutation databases may help to corroborate or to improve such knowledge spread in the literature. Pattern recognition algorithms classify data extracting information from different data domain. Clustering and biclustering classification algorithms have been explored to group scientific and business data based on measures of similarities. K-means is a popular algorithm for clustering and Bimax is used with binary data. Considering this scenario, the main contribution of this work is to develop a new methodology based on K-means and Bimax using a binary data representation of reverse transcriptase and protease sequences, in an attempt to get an unsupervised classification of the sequences that may be related to drug resistance. In our work, 14,393 sequences with selected positions of the proteins, known to be related to drug resistance, represented in an 82-dimensional vector space are analyzed by pattern recognition algorithms. The sequences are represented as binary vectors. Suitable visualization of such vectors is produced for medical interpretation and indicates some correspondence to the prediction of drug resistance given by the brazilian look up table, used by brazilian physicians, but that depends on the literature on HIV and it's quality to be created. As a consequence, in this work we describe a methodology based on the application of pattern recognition algorithms using binary data in order to suggest clusters of mutations and t- eir relations with drug resistance using a different cluster visualization scheme.
机译:HIV患者逆转录酶和蛋白酶的突变可能与耐药有关。存在许多问题,使这些突变与耐药性与检测抗性相关性的抗抗药性之间的关系的完全阐明。查找表格和规则的系统是一种尝试对序列进行分类并预测治疗失败。然而,他们取决于科学文献及其质量和可靠性。艾滋病毒突变数据库的数据密集分析可能有助于证实或改善文献中的这些知识。模式识别算法对来自不同数据域的数据提取信息分类。群集和双层分类算法已探讨基于相似措施的科学和业务数据进行组织。 K-means是一种流行的聚类算法,Bimax与二进制数据一起使用。考虑到这种情况,这项工作的主要贡献是使用逆转录酶和蛋白酶序列的二进制数据表示,基于K-means和Bimax进行新方法,试图获得可能有关的序列的无监督分类耐药性。在我们的工作中,通过图案识别算法分析了在82维矢量空间中表示的具有蛋白质的选定位置的14,393个序列,该蛋白质具有与耐药性有关。该序列表示为二元载体。为医学解释产生这种载体的合适可视化,并表明巴西医生使用的巴西查找表的耐药性预测的一些对应性,但取决于艾滋病毒的文献,它是要创造的质量。因此,在这项工作中,我们基于使用二进制数据的模式识别算法的应用来描述一种方法,以便使用不同的聚类可视化方案建议与耐药性的突变和T- EIR关系的簇。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号