首页> 外文会议>New York Scientific Data Summit >Multimodal biological analysis using NLP and expression profile
【24h】

Multimodal biological analysis using NLP and expression profile

机译:使用NLP和表达谱进行多峰生物学分析

获取原文
获取外文期刊封面目录资料

摘要

The goal of this project is to gather biological data from different sources and use computational analysis to evaluate it together. Two data sources were used: microarray gene expression data for Arabidopsis thaliana, and gene co-occurrences in scientific literature extracted from bioRxiv using natural language processing (NLP). For analysis, the microarray data was normalized, its dimensionality was reduced using principal component analysis (PCA), and it was grouped into different numbers of clusters using K-means clusters. Then these expression clusters were compared to the co-occurrence pairs in the NLP data, to evaluate the quality of the NLP extractions. This evaluation was done using entropy analysis on the combined data, compared to the maximum entropy in the clustering alone. As a result, the evaluation of the NLP data shows that the results do correspond to the clusters from the microarray data, and may be used for further analysis.
机译:该项目的目标是收集来自不同来源的生物数据,并使用计算分析将其评估在一起。使用了两个数据来源:拟南芥的微阵列基因表达数据,以及使用自然语言处理(NLP)从Biorxiv提取的科学文献中的基因共同发生。为了分析,微阵列数据被标准化,使用主成分分析(PCA)降低了其维度,并且使用K-Means集群将其分组成不同数量的簇。然后将这些表达簇与NLP数据中的共发生对进行比较,以评估NLP提取的质量。与单独聚类中的最大熵相比,使用熵分析进行了这种评估。结果,对NLP数据的评估表明,结果确实对应于来自微阵列数据的簇,并且可以用于进一步分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号