首页> 外文学位 >Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering.

【24h】

Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering.

机译：学习用于统计语言建模和协同过滤的分布式表示形式。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the increasing availability of large datasets machine learning techniques are be- coming an increasingly attractive alternative to expert-designed approaches to solving complex problems in domains where data is abundant. In this thesis we introduce several models for large sparse discrete datasets. Our approach, which is based on probabilistic models that use distributed representations to alleviate the effects of data sparsity, is applied to statistical language modelling and collaborative filtering.;To reduce the time complexity of training and making predictions with the deterministic model, we introduce a hierarchical version of the model, that can be exponentially faster. The speedup is achieved by structuring the vocabulary as a tree over words and taking advantage of this structure. We propose a simple feature-based algorithm for automatic construction of trees over words from data and show that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models.;We then turn our attention to collaborative filtering and show how RBM models can be used to model the distribution of sparse high-dimensional user rating vectors efficiently, presenting inference and learning algorithms that scale linearly in the number of observed ratings. We also introduce the Probabilistic Matrix Factorization model which is based on the probabilistic formulation of the low-rank matrix approximation problem for partially observed matrices. The two models are then extended to allow conditioning on the identities of the rated items whether or not the actual rating values are known. Our results on the Netflix Prize dataset show that both RBM and PMF models outperform online SVD models.;We introduce three probabilistic language models that represent words using learned real-valued vectors. Two of the models are based on the Restricted Boltzmann Machine (RBM) architecture while the third one is a simple deterministic model. We show that the deterministic model outperforms the widely used n-gram models and learns sensible word representations.

机译：随着大型数据集可用性的提高，机器学习技术正在成为专家设计的解决方案的一种有吸引力的替代方法，以解决数据丰富的领域中的复杂问题。在本文中，我们介绍了几种大型稀疏离散数据集的模型。我们的方法基于概率模型，该概率模型使用分布式表示来减轻数据稀疏性的影响，被应用于统计语言建模和协作过滤。;为了减少使用确定性模型进行训练和做出预测的时间复杂性，我们引入了模型的分层版本，可以指数级地更快。通过将词汇构建为单词之上的树并利用此结构来实现加速。我们提出了一种基于特征的简单算法，用于根据数据中的单词自动构建树，并证明了所得模型的性能优于非分层神经模型以及最佳n-gram模型。然后我们将注意力转向协作过滤和展示了如何使用RBM模型有效地对稀疏高维用户评分向量的分布进行建模，并提出了推理和学习算法，这些算法在观察到的评分数量上呈线性比例。我们还介绍了概率矩阵分解模型，该模型基于对部分观测矩阵的低秩矩阵逼近问题的概率公式。然后扩展这两个模型，以允许以额定项目的身份为条件，而不管实际额定值是否已知。我们在Netflix Prize数据集上的结果表明，RBM和PMF模型均优于在线SVD模型。;我们引入了三种概率语言模型，这些模型使用学习的实值向量表示单词。其中两个模型基于受限玻尔兹曼机（RBM）体系结构，而第三个模型是简单的确定性模型。我们表明确定性模型优于广泛使用的n-gram模型，并学习了明智的单词表示形式。

著录项

作者
Mnih, Andriy.;
展开▼
作者单位

University of Toronto (Canada).;

展开▼
授予单位 University of Toronto (Canada).;
学科 Artificial Intelligence.;Computer Science.
学位 Ph.D.
年度 2010
页码 137 p.
总页数 137
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Distributed by Design: On the Promises and Pitfalls of Collaborative Learning with Multiple Representations [J] . Tobin Whitea* amp, Roy Peab Journal of the Learning Sciences . 2011,第3期

机译：设计分发：关于具有多个表示形式的协作学习的承诺和陷阱
2. Collaborative distributed environments for learning design tasks by means of modelling and simulation [J] . Crescencio Bravo, Miguel A. Redondo, Manuel Ortega, Journal of network and computer applications . 2006,第4期

机译：通过建模和仿真来学习设计任务的协作分布式环境
3. Measuring functional independence in design with deep-learning language representation models [J] . Haluk Akay, Sang-Gook Kim Procedia CIRP . 2020,第Suppla1期

机译：用深学习语言表示模型测量设计功能独立性
4. Hierarchical Distributed Representations for Statistical Language Modeling [C] . John Blitzer, Kilian Q. Weinberger, Lawrence K. Saul, Annual Conference on Neural Information Processing Systems . 2005

机译：用于统计语言建模的分层分布式表示
5. A learning-based model in clickstream-based collaborative filtering. [D] . Kim, Dong-Ho. 2005

机译：基于点击流的协作过滤中的基于学习的模型。
6. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications [O] . Justin Mower, Devika Subramanian, Trevor Cohen 2018

机译：从文献衍生的语义谓词的分布式表示中学习药物副作用关系的预测模型
7. Skipping-Based Collaborative Recommendations inspired from Statistical Language Modeling [O] . Geoffray Bonnin, Armelle Brun, Anne Boyer 2010

机译：基于跨统计语言建模的基于跳过的协作建议

Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering.

摘要

著录项

相似文献

相关主题

期刊订阅