...
首页> 外文期刊>Scientific reports. >Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues
【24h】

Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues

机译:通过机器学习方法鉴定含DEP结构域的蛋白及其在人HCC组织中表达的实验分析

获取原文

摘要

The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.
机译:包含Dishevelled / EGL-10 / Pleckstrin(DEP)域的蛋白质(DEPDC)具有七个成员。但是,能否仅基于氨基酸序列将该超家族与其他蛋白质区分开仍然是未知的。在这里,我们描述了一种分离DEPDC和非DEPDC的计算方法。首先,我们检查了已知DEPDC的Pfam数,并使用每个Pfam的最长序列来构建系统发育树。随后,我们提取了DEPDC和非DEPDC的188维(188D)和20D特征,并使用随机森林分类器对其进行了分类。我们还挖掘了人类DEPDC的主题,以找到相关的域。最后,我们设计了人DEPDC在肝细胞癌(HCC)和邻近正常组织中的mRNA水平表达的实验验证方法。系统发育分析表明,DEPDCs超家族可以分为三个簇。此外,188D和20D功能均可用于有效区分两种蛋白质类型。母题分析显示DEP和RhoGAP结构域在人类DEPDC,人类HCC和广泛表达DEPDC的邻近组织中很常见。但是,它们的规定并不相同。总之,我们成功构建了DEPDC的二元分类器,并通过实验验证了它们在人HCC组织中的表达。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号