首页> 外国专利> SYSTEM AND METHOD FOR FEATURE-RICH CONTINUOUS SPACE LANGUAGE MODELS

SYSTEM AND METHOD FOR FEATURE-RICH CONTINUOUS SPACE LANGUAGE MODELS

机译：特征丰富的连续空间语言模型的系统和方法

页面导航

摘要
著录项
相似文献

摘要

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for predicting probabilities of words for a language model. An exemplary system configured to practice the method receives a sequence of words and external data associated with the sequence of words and maps the sequence of words to an X-dimensional vector, corresponding to a vocabulary size. Then the system processes each X-dimensional vector, based on the external data, to generate respective Y-dimensional vectors, wherein each Y-dimensional vector represents a dense continuous space, and outputs at least one next word predicted to follow the sequence of words based on the respective Y-dimensional vectors. The X-dimensional vector, which is a binary sparse representation, can be higher dimensional than the Y-dimensional vector, which is a dense continuous space. The external data can include part-of-speech tags, topic information, word similarity, word relationships, a particular topic, and succeeding parts of speech in a given history.

机译：本文公开了用于预测语言模型的单词的概率的系统，方法和非暂时性计算机可读存储介质。配置为实践该方法的示例性系统接收单词序列和与单词序列相关联的外部数据，并将单词序列映射到对应于词汇量的X维向量。然后，系统基于外部数据处理每个X维向量，以生成相应的Y维向量，其中每个Y维向量表示一个密集的连续空间，并输出至少一个预测为跟随单词序列的下一单词基于各自的Y维向量。 X维向量是二进制的稀疏表示，可以比Y维向量高，而Y维向量是密集的连续空间。外部数据可以包括词性标签，主题信息，单词相似度，单词关系，特定主题以及给定历史中的后续词性。

著录项

公开/公告号US2012150532A1

专利类型
公开/公告日2012-06-14

原文格式PDF
申请/专利权人 PIOTR WOJCIECH MIROWSKI;SRINIVAS BANGLORE;SUHRID BALAKRISHNAN;SUMIT CHOPRA;
展开▼

申请/专利号US20100963161
发明设计人 PIOTR WOJCIECH MIROWSKI;SRINIVAS BANGLORE;SUHRID BALAKRISHNAN;SUMIT CHOPRA;
展开▼

申请日2010-12-08
分类号G06F17/27;
国家 US
入库时间 2022-08-21 17:34:42

相似文献

专利
外文文献
中文文献