首页> 外文会议>IEEE International Conference on Machine Learning and Applications >Cell Type Identification from Single-Cell Transcriptomic Data via Gene Embedding
【24h】

Cell Type Identification from Single-Cell Transcriptomic Data via Gene Embedding

机译:通过基因嵌入来自单细胞转录组数据的细胞类型鉴定

获取原文

摘要

Single-cell RNA sequencing (scRNAseq) enables the profiling of the transcriptomes of individual cells, thus characterizing the heterogeneity of biological samples since scRNAseq experiments are able to yield high volumes of data. Analyzing scRNAseq data will be beneficial for obtaining knowledge on cancer drug resistance, gene regulation in embryonic development, and mechanisms of stem cell differentiation and reprogramming. One common goal of scRNAseq data analytics is to identify the cell type of each individual cell that has been profiled. However, data sparsity is the main challenge due to limitations of current single-cell RNA sequencing techniques. In this paper, a novel method of representing the genes as gene embeddings is proposed to reduce data sparsity of scRNAseq data for cell type identification, which is inspired by similarities between gene system and natural language system. It contains two steps: 1) transform gene sequences into gene sentences by ranking genes in terms of their expression values; 2) employ the word2vec technique to learn gene embeddings on these gene sentences. Then we build three deep learning models, namely RNNs, Attention RNNs, and Bi-directional LSTM RNNs, for cell type classification. The proposed method is evaluated on macosko2015, a large scale scRNAseq dataset with ground truth of individual cell types. Experimental results show that the proposed method performs effectively and efficiently on identifying cell types on scRNAseq data, and it can achieve promising performance even learning on limited number of genes.
机译:单细胞RNA测序(Scrnaseq)能够分析单个细胞的转录组,因此表征生物样品的异质性,因为Scrna杂志实验能够产生高量的数据。分析Scrnaseq数据将有​​利于获得癌症耐药性,胚胎发育基因调控的知识,以及干细胞分化和重编程的机制。 ScrnaSeq数据分析的一个共同目标是识别已经分析的每个单独小区的单元格类型。然而,数据稀疏性是由于当前单细胞RNA测序技术的限制导致的主要挑战。本文提出了一种代表基因作为基因嵌入的基因的新方法,以减少细胞类型鉴定的Scrnaseq数据的数据稀疏性,这是由基因系统和自然语言系统之间的相似性的启发。它含有两个步骤:1)通过在其表达值方面通过排序基因将基因序列转化为基因句子; 2)使用Word2Vec技术来学习这些基因句子的基因嵌入。然后,我们为细胞类型分类构建三个深度学习模型,即RNNS,注意力和双向LSTM RNN。在MacOSKO2015中评估了所提出的方法,这是一个大规模的ScrnaLeQ数据集,具有各个单元格类型的原始事实。实验结果表明,该方法有效且有效地在识别ScrnaLeq数据上识别细胞类型,并且它可以在有限数量的基因上实现有希望的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号