...
首页> 外文期刊>Concurrency and computation: practice and experience >Topic detectionmodel in a single-domain corpus inspired by thernhuman memory cognitive process
【24h】

Topic detectionmodel in a single-domain corpus inspired by thernhuman memory cognitive process

机译:人类记忆认知过程启发下的单域语料库主题检测模型

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

A corpus (eg, patents or news texts) is an important knowledge resource that contains variousrntopics, such as specific technologies or social events. Topic detectionmodels of corpus, eg, LatentrnDirichlet Allocation and KeyGraph, provide an important basis for exploring the status quo andrntrends in science, technology, or social events. However, these models suffer from low retrievalrnperformance as they only consider text own explicit semantics in a single-domain corpus. In addition,rnmany incremental models, such as online-LDA, are based on time slices. In this paper, a newrntopic detectionmodel is proposed to improve the topic detection performance of a single-domainrncorpus,which is inspired by a human memory cognitive process (THC). First, to improve the accuracy,rndistributions over words and inter-word relations across a corpus are utilized as backgroundrnknowledge, which is a type of implicit semantics, and we can find a more semantic-sensitive partrnof texts. Second, to realize online topic detection without time slices, we introduce a probabilityrngain-based dynamic probabilistic model to detect latent topics by learning a model based onrnthe dynamic human memory cognitive process. These two steps constitute the framework ofrnour model. The experimental results for four public datasets (Reuters-R8, Reuters-R52,WebKB,rnand Cade12) reveal that our model is approximately ten percent higher than other baselines (eg,rnKeyGraph and LDA) on the Adjusted Rand Index (ARI).
机译:语料库(例如专利或新闻文本)是一种重要的知识资源,其中包含各种主题,例如特定技术或社交事件。语料库的主题检测模型(例如LatentrnDirichlet分配和KeyGraph)为探索科学,技术或社会事件的现状和趋势提供了重要依据。但是,这些模型的检索性能较低,因为它们仅考虑文本在单域语料库中的自身显式语义。另外,许多增量模型,例如在线LDA,都是基于时间片的。在人类记忆认知过程(THC)的启发下,本文提出了一种新的主​​题检测模型,以提高单域主体的主题检测性能。首先,为了提高准确性,利用语料库中的单词分布和单词间关系作为背景知识,这是一种隐式语义,我们可以找到对语义更敏感的partrnof文本。其次,为了实现没有时间片的在线主题检测,我们通过学习基于动态人类记忆认知过程的模型,引入了基于概率增益的动态概率模型来检测潜在主题。这两个步骤构成了nour模型的框架。四个公共数据集(Reuters-R8,Reuters-R52,WebKB,rn和Cade12)的实验结果表明,我们的模型在调整后的兰德指数(ARI)上比其他基准(例如rnKeyGraph和LDA)高约10%。

著录项

  • 来源
    《Concurrency and computation: practice and experience》 |2018年第19期|e4642.1-e4642.15|共15页
  • 作者单位

    Shanghai Institute for AdvancedCommunication and Data Science, School ofComputer Engineering and Science, ShanghaiUniversity, Shanghai, China;

    Shanghai Institute for AdvancedCommunication and Data Science, School ofComputer Engineering and Science, ShanghaiUniversity, Shanghai, China;

    Shanghai Institute for AdvancedCommunication and Data Science, School ofComputer Engineering and Science, ShanghaiUniversity, Shanghai, China;

    Shanghai Institute for AdvancedCommunication and Data Science, School ofComputer Engineering and Science, ShanghaiUniversity, Shanghai, China;

    Shanghai Institute for AdvancedCommunication and Data Science, School ofComputer Engineering and Science, ShanghaiUniversity, Shanghai, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    memory cognitive process; probability gain; topic detection;

    机译:记忆认知过程概率增益话题检测;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号