首页> 美国卫生研究院文献>PLoS Genetics >Deep sequencing of HBV pre-S region reveals high heterogeneity of HBV genotypes and associations of word pattern frequencies with HCC
【2h】

Deep sequencing of HBV pre-S region reveals high heterogeneity of HBV genotypes and associations of word pattern frequencies with HCC

机译:HBV pre-S区的深度测序揭示了HBV基因型的高度异质性以及单词模式频率与HCC的关联

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Hepatitis B virus (HBV) infection is a common problem in the world, especially in China. More than 60–80% of hepatocellular carcinoma (HCC) cases can be attributed to HBV infection in high HBV prevalent regions. Although traditional Sanger sequencing has been extensively used to investigate HBV sequences, NGS is becoming more commonly used. Further, it is unknown whether word pattern frequencies of HBV reads by Next Generation Sequencing (NGS) can be used to investigate HBV genotypes and predict HCC status. In this study, we used NGS to sequence the pre-S region of the HBV sequence of 94 HCC patients and 45 chronic HBV (CHB) infected individuals. Word pattern frequencies among the sequence data of all individuals were calculated and compared using the Manhattan distance. The individuals were grouped using principal coordinate analysis (PCoA) and hierarchical clustering. Word pattern frequencies were also used to build prediction models for HCC status using both K-nearest neighbors (KNN) and support vector machine (SVM). We showed the extremely high power of analyzing HBV sequences using word patterns. Our key findings include that the first principal coordinate of the PCoA analysis was highly associated with the fraction of genotype B (or C) sequences and the second principal coordinate was significantly associated with the probability of having HCC. Hierarchical clustering first groups the individuals according to their major genotypes followed by their HCC status. Using cross-validation, high area under the receiver operational characteristic curve (AUC) of around 0.88 for KNN and 0.92 for SVM were obtained. In the independent data set of 46 HCC patients and 31 CHB individuals, a good AUC score of 0.77 was obtained using SVM. It was further shown that 3000 reads for each individual can yield stable prediction results for SVM. Thus, another key finding is that word patterns can be used to predict HCC status with high accuracy. Therefore, our study shows clearly that word pattern frequencies of HBV sequences contain much information about the composition of different HBV genotypes and the HCC status of an individual.
机译:乙型肝炎病毒(HBV)感染是世界上普遍存在的问题,尤其是在中国。超过60–80%的肝细胞癌(HCC)病例可归因于HBV高发地区的HBV感染。尽管传统的Sanger测序已被广泛用于研究HBV序列,但NGS变得越来越普遍。此外,尚不知道下一代测序(NGS)读取的HBV的字型频率是否可用于研究HBV基因型并预测HCC状态。在这项研究中,我们使用NGS对94例HCC患者和45例慢性HBV(CHB)感染者的HBV序列的pre-S区进行测序。使用曼哈顿距离,计算并比较所有个体的序列数据中的单词模式频率。使用主坐标分析(PCoA)和层次聚类对个体进行分组。单词模式频率还用于使用K最近邻(KNN)和支持向量机(SVM)建立HCC状态的预测模型。我们展示了使用单词模式分析HBV序列的强大功能。我们的主要发现包括PCoA分析的第一主坐标与基因型B(或C)序列的比例高度相关,第二主坐标与发生HCC的可能性显着相关。层次聚类首先根据个体的主要基因型将个体分组,然后再根据其HCC状况进行分组。使用交叉验证,在接收器工作特性曲线(AUC)下,KNN的高面积约为0.88,SVM的高面积约为0.92。在46例HCC患者和31例CHB患者的独立数据集中,使用SVM获得了0.77的良好AUC评分。进一步表明,每个个体的3000次读取可以产生稳定的SVM预测结果。因此,另一个关键发现是单词模式可用于高精度预测HCC状态。因此,我们的研究清楚地表明,HBV序列的单词模式频率包含有关不同HBV基因型的组成和个人HCC状态的许多信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号