首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Topic modeling in short-text using non-negative matrix factorization based on deep reinforcement learning
【24h】

Topic modeling in short-text using non-negative matrix factorization based on deep reinforcement learning

机译:基于深度加强学习的非负矩阵分解的短文本模型主题建模

获取原文
获取原文并翻译 | 示例
       

摘要

Topic modeling for short texts is a challenging and interesting problem in the machine learning and knowledge discovery domains. Nowadays, millions of documents published on the internet from various sources. Internet websites are full of various topics and information, but there is a lot of similarity between topics, contents, and total quality of sources, which causes data repetition and gives the user the same information. Another issue is data sparsity and ambiguity because the length of the short text is limited, which causes unsatisfactory results and give irrelevant results to end-users. All these mentioned issues in short texts made an interesting topic for researchers to use machine learning and knowledge discovery techniques to discover underlying topics from a massive amount of data. In this paper, we propose a combination of deep reinforcement learning (RL) and semantics-assisted non-negative matrix factorization model to extract meaningful and underlying topics from short document contents. The main objective of this work is to reduce the problem of repetitive information and data sparsity in short texts to help the users to get meaningful and relevant contents. Furthermore, our propose model reviews an issue of the Seq2Seq approach based on the reinforcement learning perspective and provides a combination of reinforcement learning and SeaNMF formulation using the block coordinate descent algorithm. Moreover, we compare different real-world datasets by using numerical calculation and present a couple of state-of-art models to get better performance on short text document topic modeling. Based on experimental results and comparative analysis, our propose model outperforms the state of art techniques in terms of short document topic modeling.
机译:短文本的主题建模是机器学习和知识发现域中有挑战性和有趣的问题。如今,来自各种来源的互联网上发表了数百万的文件。 Internet网站充满了各种主题和信息,但主题,内容和源的总质量之间存在很多相似性,这导致数据重复并给出用户相同的信息。另一个问题是数据稀疏性和歧义,因为短文本的长度是有限的,这导致不令人满意的结果并对最终用户提供无关的结果。所有这些中提到的简短文本问题对研究人员来说,使用机器学习和知识发现技术来发现来自大量数据的基础主题。在本文中,我们提出了深度加强学习(RL)和语义辅助非负矩阵分解模型的组合,以从短文档内容中提取有意义和基础的主题。这项工作的主要目标是减少短文本中重复信息和数据稀疏问题的问题,以帮助用户获得有意义和相关的内容。此外,我们的建议模式根据加强学习的角度,通过块坐标阶级算法提供增强学习和SeanMF配方的组合。此外,我们通过使用数值计算来比较不同的现实数据集,并在几个最先进的模型中展示了在短文本文档主题建模上获得更好的性能。基于实验结果和比较分析,我们提出的模型在短文档主题建模方面优于现有技术的现实状态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号