首页> 外文会议>International conference on computational science and technology >Creating Prior-Knowledge of Source-LDA for Topic Discovery in Citation Network

【24h】

Creating Prior-Knowledge of Source-LDA for Topic Discovery in Citation Network

机译：在引文网络中创建用于主题发现的源LDA的先前知识

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Discovering and understanding the development of research topics in the community is useful for identifying important milestones and prominent researches. Recent works related to detect topics from scientific corpus also used the latent Dirichlet Allocation (LDA) to explore topics of papers. These systems usually used abstract of papers as the corpus instead of full papers. However, the LDA is based on the bag-of-words model so with such short texts it will give low accuracy. The tendency for improvement is to add prior knowledge to the analysis process with the latest algorithm, Source-LDA, which was presented by Justin Wood et al. at UCLA in 2017. We found that the Source-LDA has some shortcomings to overcome. Firstly, it is also based on counting method as LDA so short text will decrease the accuracy. Secondly, the knowledge source mentioned in the algorithm is constructed manually from labeled text data. This make Source-LDA becomes a supervised method. Therefore, we propose an approach to automatically construct knowledge source for Source-LDA from unlabeled data with an assumption that a specific paper will often cite papers which contain related topics. This approach both helps to integrate source knowledge in an unsupervised manner and resolve the issue of short text by using information from citation network. In the first stage, the propound method has achieved encouraging results.

机译：发现和理解社区研究主题的发展对于确定重要的里程碑和突出研究是有用的。最近的作品与科学语料库中的检测主题有关也使用了潜在的Dirichlet分配（LDA）来探索论文的主题。这些系统通常使用摘要作为语料库而不是全文。但是，LDA基于袋式模型，因此具有这种短文本，它将提供低精度。改进的趋势是使用最新算法，源LDA为分析过程添加到分析过程中，该过程由Justin Wood等人提出。在2017年的UCLA。我们发现源LDA有一些缺点克服。首先，它还基于计数方法作为LDA，因此短文本将降低准确性。其次，算法中提到的知识源由标记的文本数据手动构造。这使源LDA成为一个受监督方法。因此，我们提出了一种方法来自动构建来自未标记数据的源LDA的知识源，假设特定论文通常会引用包含相关主题的文件。这种方法都有助于以无监督的方式集成源知识，并通过使用引文网络的信息来解决短信问题。在第一阶段，取得的方法取得了令人鼓舞的结果。

著录项

来源
《International conference on computational science and technology 》|2018年|x 466 p.|共11页
会议地点
作者
Ho Duy Tri Nguyen; Trac Thuc Nguyen; Phuc Do;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词
Citation network; Topic modeling; Source-LDA Knowledge source; LDA model;

机译：引文网络;主题建模;源-LDA知识来源;LDA模型;

相似文献

外文文献
中文文献
专利

1. Effective discovery of missing links in citation networks using citation relevancy check process [J] . J.P. Nivash, L.D. Dhinesh Babu International Journal of Business Intelligence and Data Mining . 2019 ,第4期

机译：使用引用相关性检查过程有效发现引用网络中的缺失链接
2. Toward the Discovery of Citation Cartels in Citation Networks [J] . Fister Iztok Jr., Fister Iztok, Perc Matja?? Frontiers in Physics . 2016 ,第2012期

机译：致力于在引文网络中发现引文卡特尔
3. A Sparse Topic Model for Bursty Topic Discovery in Social Networks [J] . Shi Lei, Du Junping, Kou Feifei The international arab journal of information technology . 2020 ,第5期

机译：社交网络中突发主题发现的稀疏主题模型
4. Creating Prior-Knowledge of Source-LDA for Topic Discovery in Citation Network [C] . Ho Duy Tri Nguyen, Trac Thuc Nguyen, Phuc Do International conference on computational science and technology . 2018

机译：在引文网络中创建用于主题发现的源LDA的先前知识
5. A network approach to topic summary and knowledge discovery in social tagging. [D] . Xiang, Xin. 2011

机译：社交标记中主题摘要和知识发现的网络方法。
6. How citation distortions create unfounded authority: analysis of a citation network [O] . Steven A Greenberg 2009

机译：引文失真如何造成毫无根据的权威：引文网络分析
7. Toward the discovery of citation cartels in citation networks [O] . Fister, Iztok, Perc, Matjaž 2017

机译：在引文网络中发现引用卡特尔

Creating Prior-Knowledge of Source-LDA for Topic Discovery in Citation Network

摘要

著录项

相似文献

相关主题

期刊订阅