...
首页> 外文期刊>Canadian Journal of Biotechnology >The unknown-unknowns: Revealing the hidden insights in massive biomedical data using combined artificial intelligence and knowledge networks
【24h】

The unknown-unknowns: Revealing the hidden insights in massive biomedical data using combined artificial intelligence and knowledge networks

机译:未知-未知:使用人工智能和知识网络相结合,揭示海量生物医学数据中的隐藏见解

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Genomic data is estimated to be doubling every seven months with over 2 trillion bases from whole genome sequence studies deposited in Genbank in just the last 15 years alone. Recent advances in compute and storage have enabled the use of artificial intelligence techniques in areas such as feature recognition in digital pathology and chemical synthesis for drug development. To apply A.I. productively to multidimensional data such as cellular processes and their dysregulation, the data must be transformed into a structured format, using prior knowledge to create contextual relationships and hierarchies upon which computational analysis can be performed. Here we present the organization of complex data into hypergraphs that facilitate the application of A.I. We provide an example use case of a hypergraph containing hundreds of biological data values and the results of several classes of A.I. algorithms applied in a popular compute cloud. While multiple, biologically insightful correlations between disease states, behavior, and molecular features were identified, the insights of scientific import were revealed only when exploration of the data included visualization of subgraphs of represented knowledge. The results suggest that while machine learning can identify known correlations and suggest testable ones, the greater probability of discovering unexpected relationships between seemingly independent variables (unknown-unknowns) requires a context-aware system – hypergraphs that impart biological meaning in nodes and edges. We discuss the implications of a combined hypergraph-A.I. analysis approach to multidimensional data and the pre-processing requirements for such a system.
机译:据估计,仅在过去的15年中,Genbank进行的全基因组序列研究就使基因组数据每7个月翻一番,超过2万亿个碱基。计算和存储的最新进展已使人工智能技术可用于诸如数字病理学中的特征识别和用于药物开发的化学合成等领域。申请人工智能为了有效地处理多维数据(例如细胞过程及其失调),必须使用先验知识创建可以执行计算分析的上下文关系和层次结构,将数据转换为结构化格式。在这里,我们将复杂的数据组织成超图,以方便AI的应用。我们提供了一个包含数百个生物学数据值和几种AI类结果的超图示例使用案例。流行计算云中应用的算法。虽然确定了疾病状态,行为和分子特征之间存在多种生物学上的深刻关联,但只有在数据探索包括所代表知识的子图可视化的情况下,才能揭示科学意义的深刻见解。结果表明,虽然机器学习可以识别已知的相关性并提出可测试的相关性,但发现看似独立变量(未知-未知)之间的意外关系的可能性更大,这需要上下文感知系统-在节点和边缘赋予生物学意义的超图。我们讨论了组合超图A.I.多维数据的分析方法以及此类系统的预处理要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号