首页> 外国专利> Apparatus for generating a statistical model called class bi-multigram model with bigram dependencies assumed between adjacent sequences

Apparatus for generating a statistical model called class bi-multigram model with bigram dependencies assumed between adjacent sequences

机译:用于生成统计模型的设备,该模型称为类二元模型,其假设相邻序列之间具有二元组依赖性

摘要

An apparatus is disclosed for generating a statistical class sequence model called class bi-multigram model from input strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences. There are counted the number of times all sequences of units occur and the number of times all pairs of sequences of units co-occur in the input training strings of units, and an initial bigram probability distribution of all the pairs of sequences is computed as the counted number of times the two sequences co-occur divided by the counted number of times the first sequence occurs in the input training string. Then the input sequences are classified into a pre-specified desired number of classes. Further, an estimate of the bigram probability distribution of the sequences is calculated by using an EM algorithm to maximize the likelihood of the input training string computed with the input probability distributions, and the above processes are iteratively performed to generate a statistical class sequence model.
机译:公开了一种用于从离散值单元的输入串生成称为类双二元模型模型的统计类序列模型的设备,其中在最大长度为N个单元的相邻可变长度序列之间假设二元组依赖性,并且将类别标签分配给序列。计算所有单元序列出现的次数,以及在输入的训练单元串中同时出现所有单元序列对的次数,计算所有序列对的初始二元概率分布,作为两个序列同时发生的计数次数除以在输入训练字符串中第一个序列出现的计数次数。然后,将输入序列分类为预定的所需数量的类别。此外,通过使用EM算法来最大化序列的二元组概率分布的估计,以使利用输入概率分布计算出的输入训练串的可能性最大化,并且迭代地执行以上过程以生成统计类序列模型。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号