首页> 外文会议>International conference on computational science and technology >Creating Prior-Knowledge of Source-LDA for Topic Discovery in Citation Network
【24h】

Creating Prior-Knowledge of Source-LDA for Topic Discovery in Citation Network

机译:在引文网络中创建用于主题发现的源LDA的先前知识

获取原文

摘要

Discovering and understanding the development of research topics in the community is useful for identifying important milestones and prominent researches. Recent works related to detect topics from scientific corpus also used the latent Dirichlet Allocation (LDA) to explore topics of papers. These systems usually used abstract of papers as the corpus instead of full papers. However, the LDA is based on the bag-of-words model so with such short texts it will give low accuracy. The tendency for improvement is to add prior knowledge to the analysis process with the latest algorithm, Source-LDA, which was presented by Justin Wood et al. at UCLA in 2017. We found that the Source-LDA has some shortcomings to overcome. Firstly, it is also based on counting method as LDA so short text will decrease the accuracy. Secondly, the knowledge source mentioned in the algorithm is constructed manually from labeled text data. This make Source-LDA becomes a supervised method. Therefore, we propose an approach to automatically construct knowledge source for Source-LDA from unlabeled data with an assumption that a specific paper will often cite papers which contain related topics. This approach both helps to integrate source knowledge in an unsupervised manner and resolve the issue of short text by using information from citation network. In the first stage, the propound method has achieved encouraging results.
机译:发现和理解社区研究主题的发展对于确定重要的里程碑和突出研究是有用的。最近的作品与科学语料库中的检测主题有关也使用了潜在的Dirichlet分配(LDA)来探索论文的主题。这些系统通常使用摘要作为语料库而不是全文。但是,LDA基于袋式模型,因此具有这种短文本,它将提供低精度。改进的趋势是使用最新算法,源LDA为分析过程添加到分析过程中,该过程由Justin Wood等人提出。在2017年的UCLA。我们发现源LDA有一些缺点克服。首先,它还基于计数方法作为LDA,因此短文本将降低准确性。其次,算法中提到的知识源由标记的文本数据手动构造。这使源LDA成为一个受监督方法。因此,我们提出了一种方法来自动构建来自未标记数据的源LDA的知识源,假设特定论文通常会引用包含相关主题的文件。这种方法都有助于以无监督的方式集成源知识,并通过使用引文网络的信息来解决短信问题。在第一阶段,取得的方法取得了令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号