首页> 外国专利> Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events

Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events

机译:从稀疏数据估计一组事件中的一个特定事件是一连串事件中的下一个事件的概率的设备和方法

摘要

Apparatus and method for evaluating the likelihood of an event (such as a word) following a string of known events, based on event sequence counts derived from sparse sample data. Event sequences --or m-grams -include a key and a subsequent event. For each m-gram which was counted in the sample data, there is stored a discounted probability P generated by applying a modified Turing's estimate, for example, to a count-based probability. For a key occurring in the sample data there is stored a normalization constant a which (a) adjusts the discounted probabilities for multiple counting, if any, and (b) includes a freed probability mass allocated to m-grams which do not occur in the sample data. To determine the likelihood of a selected event following a string of known events, a "backing off" scheme is employed in which successively shorter included keys (of known events) followed by the selected event (representing m-grams) are searched (302, 308) until an m-gram is found having a discounted probability stored therefor. The normalization constants (306, 312) of the longer searched keys --for which the corresponding m-grams have no stored discounted probability --are combined together with the found discounted probability to produce (304, 310, 314) the likelihood of the selected event being next.
机译:用于基于从稀疏样本数据中得出的事件序列计数来评估一系列已知事件之后的事件(例如单词)的可能性的设备和方法。事件序列(或m-gram)包括一个密钥和一个后续事件。对于在样本数据中计数的每个m-gram,存储了一个折后的概率P,该概率是通过将修改的Turing估计应用于例如基于计数的概率而生成的。对于出现在样本数据中的密钥,存储了一个归一化常数a,(a)调整多次计数的折现概率(如果有),并且(b)包括分配给m-gram的自由概率质量,该概率质量在样本数据。为了确定一系列已知事件之后某个选定事件的可能性,采用“退避”方案,在该方案中,依次搜索(已知事件的)较短的包含关键字,然后搜索选定事件(表示m-gram)(302, 308),直到找到一个m-gram,并为其存储了折扣概率。将较长的搜索关键字的归一化常数(306、312)(对应的m-gram没有存储的折扣概率)与找到的折扣概率结合起来,以产生(304、310、314)下一个选定的事件。

著录项

  • 公开/公告号EP0245595A1

    专利类型

  • 公开/公告日1987-11-19

    原文格式PDF

  • 申请/专利权人 INTERNATIONAL BUSINESS MACHINESCORPORATION;

    申请/专利号EP19870102789

  • 发明设计人 KATZ SLAVA M.;

    申请日1987-02-27

  • 分类号G10L5/06;

  • 国家 EP

  • 入库时间 2022-08-22 06:56:14

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号