首页> 外文期刊>Information Sciences: An International Journal >Graph-induced restricted Boltzmann machines for document modeling
【24h】

Graph-induced restricted Boltzmann machines for document modeling

机译:图诱导受限玻尔兹曼机用于文档建模

获取原文
获取原文并翻译 | 示例
           

摘要

Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation - the restricted Boltzmann machine (RBM) - where the underlying graphical model is an undirected bipartite graph. Inference is efficient - document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy. (C) 2015 Elsevier Inc. All rights reserved.
机译:从非结构化文本中发现知识是数据挖掘和机器学习的中心主题。我们专注于从语料库快速发现主题结构。我们的方法基于一种通用的概率公式化-受限玻尔兹曼机(RBM)-其中基础图形模型是无向二部图。推理是有效的-可以使用单个矩阵投影来计算文档表示,这使得RBM适用于当今的大量文本语料库。但是,标准的RBM在假设单词袋的情况下运行,而忽略了单词之间固有的潜在关系结构。这导致词主题分组的连贯性降低。我们介绍了利用语言结构的基于图的正则化方案,而语言结构又可以从语料统计或领域知识中构建。我们证明了所提出的技术提高了组的连贯性,促进了可视化,提供了用于估计固有维数的方法,减少了过度拟合,并可能导致更好的分类精度。 (C)2015 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号