...
首页> 外文期刊>International journal of data mining and bioinformatics >BioTopic: a topic-driven biological literature mining system
【24h】

BioTopic: a topic-driven biological literature mining system

机译:BioTopic:主题驱动的生物文献挖掘系统

获取原文
获取原文并翻译 | 示例
           

摘要

Biology and biomedicine are flourishing disciplines, with massive biological data produced in experiments and huge amount of research papers published in journals. In such a big data context, unsupervised data mining methods such as topic models are used to extract topics from large-scale document collections. In this paper, we present a biological literature mining system based on topic modelling (BioTopic). Experiments show that the perplexity reduction percentage of our pre-processing method is 5% larger that of a traditional pre-processing method. The precision of our search performance reaches 86%, which is better that that of a unigram language model. Our method employs linguistic information from shallow parsing to better pre-process biological literature for topic models. BioTopic with finegrained pre-processing and topic modelling works better than traditional literature mining systems.
机译:生物学和生物医学是蓬勃发展的学科,在实验中产生了大量的生物学数据,并在期刊上发表了大量的研究论文。在如此大的数据环境中,使用无监督的数据挖掘方法(例如主题模型)从大规模文档集中提取主题。在本文中,我们介绍了一种基于主题建模(BioTopic)的生物文献挖掘系统。实验表明,我们的预处理方法的困惑度降低百分比是传统预处理方法的5%。我们的搜索性能精度达到86%,优于unigram语言模型。我们的方法利用从浅层解析到更好的预处理生物学文献的语言信息来建立主题模型。具有细粒度预处理和主题建模的BioTopic的效果比传统文献挖掘系统更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号