首页> 外国专利> Reduction of the search area in the case of speech recognition with the use of phoneme limits and phoneme classes

Reduction of the search area in the case of speech recognition with the use of phoneme limits and phoneme classes

机译：通过使用音素限制和音素类别来减少语音识别中的搜索区域

页面导航

摘要
著录项
相似文献

摘要

A method for estimating the probability of phone boundaries as well as the accuracy of the acoustic modelling in cutting down a search-space in a speech recognition system. The accuracy of the acoustic modelling is quantified by the rank of the correct phone. The invention includes a microphone for converting an utterance into an electrical signal. The signal from the microphone is processed by an acoustic processor and label match which finds the best-matched acoustic label prototype from the acoustic label prototype store. A probability distribution on phone boundaries is then produced for every time frame using the first decision tree described in the invention. These probabilities are compared to a threshold and some time frames are identified as boundaries between phones. An acoustic score is computed, for all phones between every given pair of hypothesized boundaries, and the phones are ranked on the basis of this score. The second decision tree is traversed for every time frame to obtain the worst case rank of the correct phone at that time, and using the phone score and phone rank computed in, a shortlist of allowed phones is made up for every time frame. This information is used to select a subset of acoustic word models in store, and a fast acoustic word match processor matches the label string from the acoustic processor against this subset of abridged acoustic word models to produce an utterance signal. The utterance signal output by the fast acoustic word match processor comprises of at least one word. In general, however, the fast acoustic word match processor will output a number of candidate words. Each word signal produced by the fast acoustic word match processor is input into a word context match which compares the word context to language models in store and outputs at least one candidate word. From the recognition candidates produced by the fast acoustic match and the language model, the detailed acoustic match matches the label string from the acoustic processor against detailed acoustic word models in store and outputs a word string corresponding to an utterance. IMAGE

机译：一种在减少语音识别系统中的搜索空间时估计电话边界的概率以及声学建模的准确性的方法。声学建模的准确性由正确手机的等级来量化。本发明包括用于将话语转换为电信号的麦克风。来自麦克风的信号由声学处理器和标签匹配处理，标签匹配从声学标签原型存储中找到最匹配的声学标签原型。然后使用本发明中描述的第一决策树针对每个时间帧在电话边界上产生概率分布。将这些概率与阈值进行比较，并将某些时间范围标识为电话之间的边界。计算每个给定假设边界对之间的所有电话的声学得分，并根据该得分对电话进行排名。在每个时间范围内遍历第二个决策树，以获取当时正确电话的最坏情况等级，然后使用计算出的电话得分和电话等级，为每个时间范围组成允许的电话清单。该信息用于选择存储中的声学词模型的子集，并且快速声学词匹配处理器将来自声学处理器的标签串与删节的声学词模型的该子集进行匹配以产生发声信号。快速声学词匹配处理器输出的发声信号包括至少一个词。然而，一般而言，快速声学词匹配处理器将输出多个候选词。由快速声学单词匹配处理器产生的每个单词信号被输入到单词上下文匹配中，该单词上下文匹配将单词上下文与所存储的语言模型进行比较并输出至少一个候选单词。从快速声学匹配和语言模型产生的识别候选中，详细声学匹配将来自声学处理器的标签字符串与存储的详细声学单词模型进行匹配，并输出与发声相对应的单词字符串。 <图像>

著录项

公开/公告号DE69518723T2

专利类型
公开/公告日2001-05-23

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORP. ARMONK;
展开▼

申请/专利号DE1995618723T
发明设计人 NAHAMOO DAVID;PADMANABHAN MUKUND;
展开▼

申请日1995-06-21
分类号G10L15/04;G10L15/28;G10L15/14;
国家 DE
入库时间 2022-08-22 01:08:32

相似文献

专利
外文文献
中文文献