首页>
外国专利>
System and method for providing lossless compression of n-gram language models in a real-time decoder
System and method for providing lossless compression of n-gram language models in a real-time decoder
展开▼
机译:用于在实时解码器中提供n元语法模型的无损压缩的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
System and methods for compressing (losslessly) n-gram language models for use in real-time decoding, whereby the size of the model is significantly reduced without increasing the decoding time of the recognizer. Lossless compression is achieved using various techniques. In one aspect, n-gram records of an N-gram language model are split into (i) a set of common history records that include subsets of n-tuple words having a common history and (ii) sets of hypothesis records that are associated with the common history records. The common history records are separated into a first group of common history records each having only one hypothesis record associated therewith and a second group of common history records each having more than one hypothesis record associated therewith. The first group of common history records are stored together with their corresponding hypothesis record in an index portion of a memory block comprising the N-gram language model and the second group of common history records are stored in the index together with addresses pointing to a memory location having the corresponding hypothesis records. Other compression techniques include, for instance, mapping word records of the hypothesis records into word numbers and storing a difference value between subsequent word numbers; segmenting the addresses and storing indexes to the addresses in each segment to multiples of the addresses; storing word records and probability records as fractions of bytes such that each pair of word- probability records occupies a multiple of bytes and storing flags indicating the length; and storing the probability records as indexes to sorted count values that are used to compute the probability on the run.
展开▼