首页> 外文期刊>International journal of online engineering >Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling
【24h】

Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling

机译:在线科学文章中的相互关系挖掘及其可视化:系统生物学建模的自然语言处理

获取原文
           

摘要

With the rapid growth in the numbers of scientific publications in domains such as neuroscience and medicine, visually interlinking documents in online databases such as PubMed with the purpose of indicating the context of a query results can improve the multi-disciplinary relevance of the search results. Translational medicine and systems biology rely on studies relating basic sciences to applications, often going through multiple disciplinary domains. This paper focuses on the design and development of a new scientific document visualization platform, which allows inferring translational aspects in biosciences within published articles using machine learning and natural language processing (NLP) methods. From online databases, this software platform effectively extracted relationship connections between multiple sub-domains within neuroscience derived from abstracts related to user query. In our current implementation, the document visualization platform employs two clustering algorithms namely Suffix Tree Clustering (STC) and LINGO. Clustering quality was improved by mapping top-ranked cluster labels derived from an UMLS-Metathesaurus using a scoring function. To avoid non-clustered documents, an iterative scheme, called auto-clustering was developed and this allowed mapping previously uncategorized documents during the initial grouping process to relevant clusters. The efficacy of this document clustering and visualization platform was evaluated by expert-based validation of clustering results obtained with unique search terms. ?Compared to normal clustering, auto-clustering demonstrated better efficacy by generating larger numbers of unique and relevant cluster labels. Using this implementation, a Parkinson’s disease systems theory model was developed and studies based on user queries related to neuroscience and oncology have been showcased as applications.
机译:随着诸如神经科学和医学等领域的科学出版物数量的快速增长,在线链接诸如PubMed等在线数据库中的文档以指示查询结果的上下文,从而可以改善搜索结果的多学科相关性。转化医学和系统生物学依赖于将基础科学与应用程序相关的研究,这些研究通常要经过多个学科领域。本文着重于设计和开发新的科学文档可视化平台,该平台允许使用机器学习和自然语言处理(NLP)方法在已发表文章中推断生物科学中的翻译方面。从在线数据库中,该软件平台有效地提取了神经科学中多个子域之间的关系连接,这些子域是从与用户查询有关的摘要中得出的。在我们当前的实现中,文档可视化平台采用两种聚类算法,即后缀树聚类(STC)和LINGO。通过使用评分功能映射从UMLS-Metathesaurus派出的排名最高的聚类标签,改善了聚类质量。为避免非聚类文档,开发了一种称为自动聚类的迭代方案,该方案允许在初始分组过程中将以前未分类的文档映射到相关的聚类。通过基于专家的对使用唯一搜索词获得的聚类结果的验证,评估了此文档聚类和可视化平台的功效。与常规聚类相比,自动聚类通过生成大量唯一且相关的聚类标签显示出更好的效果。使用此实现,开发了帕金森氏病系统理论模型,并将基于用户查询的与神经科学和肿瘤学有关的研究展示为应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号