首页> 外国专利> Method and Apparatus for Processing Text with Variations in Vocabulary Usage

Method and Apparatus for Processing Text with Variations in Vocabulary Usage

机译:处理词汇用法变化的文本的方法和装置

摘要

Text is processed to construct a model of the text. The text has a shared vocabulary. The text is partitioned into sets and subsets of texts. The usage of the shared vocabulary in two or more sets is different, and the topics of two or more subsets are different. A probabilistic model is defined for the text. The probabilistic model considers each word in the text to be a token having a position and a word value, and the usage of the shared vocabulary, topics, subtopics, and word values for each token in the text are represented using distributions of random variables in the probabilistic model, wherein the random variables are discrete. Parameters are estimated for the model corresponding to the vocabulary usages, the word values, the topics, and the subtopics associated with the words.
机译:处理文本以构造文本模型。文本具有共享的词汇表。文本分为文本集和子集。两个或更多集合中共享词汇的用法不同,并且两个或更多子集的主题也不同。为文本定义了一个概率模型。概率模型将文本中的每个单词视为具有位置和单词值的令牌,并且使用以下变量中的随机变量分布来表示文本中每个令牌的共享词汇,主题,子主题和单词值的用法概率模型,其中随机变量是离散的。针对与词汇用法,单词值,主题和与单词相关联的子主题相对应的模型,估计参数。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号