首页> 外文会议>2016 IEEE International Conference on Knowledge Engineering and Applications >A diversifying hidden units method based on NMF for document representation
【24h】

A diversifying hidden units method based on NMF for document representation

机译:基于NMF的文档表示形式多样化隐藏单元方法

获取原文
获取原文并翻译 | 示例

摘要

Document modeling with hidden units as known as topics are very popular. Non-negative matrix factorization(NMF) is one of the most important techniques in document representation, which decomposes a document-term matrix into a document-topic matrix and a topic-term matrix. Since orthogonal constraint would limit terms occur only in one topic, we abandon this strong constraint. Furthermore, in order to represent documents in a certain number of topics with more semantic information, we add diversifying regularization and sparse constraint into NMF, which shows a great improvement in text classification and clustering. In the end, we draw the figure of topics similarities and display the top 20 weighted words in each topic to reveal that diversifying regularization can efficiently reduce the overlapping terms.
机译:具有隐藏单元(称为主题)的文档建模非常流行。非负矩阵分解(NMF)是文档表示中最重要的技术之一,它将文档术语矩阵分解为文档主题矩阵和主题术语矩阵。由于正交约束将限制项仅出现在一个主题中,因此我们放弃了这一强约束。此外,为了在某些主题中用更多的语义信息表示文档,我们在NMF中添加了多样化的正则化和稀疏约束,这在文本分类和聚类方面显示出了很大的改进。最后,我们绘制主题相似度的图,并显示每个主题中排名前20位的加权词,以表明多样化的正则化可以有效地减少重叠项。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号