首页> 外文OA文献 >The efficiency of corpus-based distributional models for literature-based discovery on large data sets
【2h】

The efficiency of corpus-based distributional models for literature-based discovery on large data sets

机译:基于语料库的分布模型在大数据集上基于文献的发现的效率

摘要

This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies.ududLiterature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. ududThe successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations.ududIn this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.
机译:本文评估了许多基于语料库的流行分布模型在超大型文档集(包括在线馆藏)上执行发现的效率。基于文学的发现是从文本(通常是出版的文学)中识别出先前未知的联系的过程,这可能导致新技术的发展。 ud ud基于文学的发现自斯旺森的偶然发现以来引起了越来越多的研究兴趣。鱼油在1986年对雷诺氏病的治疗效果。 ud ud在医学领域中已充分证明了分布模型在自动识别基于文献的发现的间接关联中的成功应用。但是,我们希望调查分布模型的计算复杂性,以便在更大的文档集上进行基于文献的发现,因为它们可能为包括预测未来颠覆性创新在内的任务提供计算上易于处理的解决方案。 ud ud在本文中,我们执行了计算复杂性四个成功的基于语料库的分布模型的分析,以评估它们是否适合此类任务。我们的结果表明,以语料库为基础的分布模型将其表示形式存储在固定维度中,可以为基于文献的发现任务提供更高的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号