首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Mining meaningful topics from massive biomedical literature
【24h】

Mining meaningful topics from massive biomedical literature

机译:从大量的生物医学文献中挖掘有意义的主题

获取原文

摘要

There is huge amount of biomedical and biological literature online or in digital libraries. Moreover, new research papers are published with an exponential growth in recent years. So it is pressing and challenging to mine meaningful topics from massive biomedical literature. The mined topics are helpful to researchers for literature exploration and topic discovery. However, latent topics inferred by traditional topic models are not always coherent and meaningful. In this work, we propose a new methodology to mine meaningful biomedical topics with a combination of several off-the-shelf text mining techniques such as part-of-speech tagging, base noun phrase chunking, K-means clustering and latent Dirichlet allocation, which endow our methodology with scalability and implementation simplicity. We conduct comprehensive experiments on a dataset collected from PubMed. The experimental results demonstrate that our method significantly outperforms a baseline method. We also perform a qualitative analysis and present meaningful biomedical topics and multi-word expressions.
机译:在线或数字图书馆中有大量的生物医学和生物学文献。此外,近年来发表的新研究论文呈指数增长。因此,从大量的生物医学文献中挖掘有意义的话题是紧迫而又充满挑战的。挖掘的主题对研究人员进行文献探索和主题发现很有帮助。但是,由传统主题模型推断出的潜在主题并不总是连贯且有意义的。在这项工作中,我们提出了一种结合几种现成的文本挖掘技术(例如词性标记,基本名词短语分块,K均值聚类和潜在Dirichlet分配)来挖掘有意义的生物医学主题的新方法。这使我们的方法具有可扩展性和实现简单性。我们对从PubMed收集的数据集进行了全面的实验。实验结果表明,我们的方法明显优于基线方法。我们还将进行定性分析,并提出有意义的生物医学主题和多词表达方式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号