首页> 外国专利> A method for automatic recognition of spoken words

A method for automatic recognition of spoken words

机译:一种自动识别口语的方法

摘要

981, 383. Identifying spoken words. INTERNATIONAL BUSINESS MACHINES CORPORATION. Aug. 28, 1961 [Aug. 29, 1960], No. 30960/61. Heading G4R. In a system for the recognition of spoken words means are provided to derive an electric signal representing the sound and circuits responsive to a number of selected properties of the signals which vary during the duration of the word and further circuits controlled by the time of operation of the first circuits to identify particular characteristics in the sound. A system arranged to recognise the spoken digits "zero" to "nine" consists of a microphone 20, Fig. 2 and amplifier 21 and six detector circuits 24-29 to which the signal is applied. The voicing detector 24 responds to an asymmetric characteristic found in the vocal chord sounds of speech. These sounds generally represent the vowel sounds as opposed to the frictional and other consonant sounds. The circuits 25-27 respond to specific vowel characteristics to distinguish particular words. Circuit 25 gives an output when the vowel sound of " one" is present but not when "nine" is present. The circuit 26 responds to the sound "four" but not "three" and circuit 27 distinguishes "two" from "seven" by giving an output only when "seven" is present. Two further circuits 28, 29 respond to strong frictional sounds (such as "s", hard "t" and "x") and weak frictional sounds (such as "f", "v" and soft "t"). The circuits 24-29 are each connected to relays in the "sound increment sequency register" 16. The relay contacts are interconnected as shown in Fig. 3 to obtain further signals; a "weak friction early" (k2), "strong friction early" (k3), "Voicing and friction" (k4) "Weak friction late" (k5) and "Strong friction late" (k6). Early and late indicate that the frictional sound comes before or after the voice sound. Contacts of the relays K1-K11 are connected in a network Fig. 4 to indicate the presence of particular combinations representing the ten digits. "Zero", for example gives a voicing and friction signal which comes from the "z" sound. Relays K1 and K4 give an output on the "zero" line in Fig. 4. Other digit words are identified in a similar way. Circuits 24 29: The voicing detector 24 measures the difference between the peak of the positive envelope of the word signals and the peak of the negative envelope. The signals are generally complex waves rather like damped oscillations. The signals are applied to a phase shifting circuit which passes all frequencies of interest. This consists of a transistor 50 having a network consisting of an adjustable resistor 60 and capacitor 61. The output is applied via a transformer 63 to oppositely poled diodes each having a capacitor 68, 73 and coupled to a junction point 70 through resistors. A voice signal produces an out-of-balance between the two capacitors 68, 73 and a corresponding signal output at terminal 70. The "m" and "n" sounds called "machine vowel sounds" give a balanced signal and no output at terminal 70. Adjustment of the resistor alters the response to different voicing sounds and may be used to distinguish between "three" and "four", the former giving a positive response and the second a negative. With another adjustment "one" and "nine" can be distinguished in the same way. By further adjustment a pulse of one polarity may be followed by an opposite pulse in response to particular conditions. These responses can be identified by suitable circuits, for example a multivibrator can be set by the pulse of first polarity and its output used to enable for a predetermined period a gate for the second pulse. The circuit 27 distinguishing "two" from "seven" comprises a high pass filter 100 Fig. 10 and a low pass filter 102, the outputs being applied through oppositely poled diodes to integrating circuits. The outputs are additively combined in resistor 122. The outputs for "two" and "seven" are of opposite polarity. The circuit 28 is shown in Fig. 8 consists of a high pass filter 80 (passing signals over 5000 cycles) the output of which is applied through adjustable resistor 81, diode 82 to integrating capacitor 84. A threshold device may be connected to respond to strong friction signals. Circuit 29 detecting weak friction sounds as shown in Fig. 9. The input signals are applied to a high gain clipper amplifier 87 to get a series of rectangular pulses which trigger a multivibrator 88 to give a series of short pulses one for each zero crossing the input signal. The rectifying and integrating circuit 90, 91, 93, 94 serves to measure the number of zero crossings occurring in a certain time period. An output of a certain value, detected by a threshold device, indicates a weak friction sound. Double vowel words:- The system may be extended to recognise double voice sound words by switching the first part of a word signal to a first register and after the detection of a machine syllable to switch the second part to a second register. The outputs of the two registers are combined to identify the word. The syllable detector may respond to the occurrence of a second voice sound signal.
机译:981,383.识别口语。国际商业机器公司。 1961年8月28日[8月。 1960年1月29日],第30960/61号。标题G4R。在用于识别口语单词的系统中,提供了一种装置,以响应于在单词持续时间内变化的信号的许多选定特性,来导出代表声音和电路的电信号,并进一步控制由操作时间控制的电路。识别声音中特定特征的第一电路。安排识别语音数字“零”到“九”的系统由图2的麦克风20和放大器21以及六个信号被施加到的检测器电路24-29组成。语音检测器24响应在语音的和弦声音中发现的不对称特性。这些声音通常代表元音,而不是摩擦声和其他辅音。电路25-27响应于特定的元音特征以区分特定的单词。当元音存在“一个”时,电路25给出输出,而当“九”存在时,则不输出。电路26响应声音“四个”而不是“三个”,并且电路27仅在存在“七个”时才给出输出,从而将“两个”与“七个”区分开。另外两个回路28、29响应于强摩擦声(例如“ s”,“ t”和“ x”)和弱摩擦声(诸如“ f”,“ v”和“ t”)。电路24-29每个都连接到“声音增量顺序寄存器” 16中的继电器。继电器触点如图3所示相互连接以获得进一步的信号。 “早期较弱的摩擦力”(k2),“早期较强烈的摩擦力”(k3),“发声和摩擦力”(k4)“晚期较弱的摩擦力”(k5)和“晚期较弱的摩擦力”(k6)。早和晚表明摩擦声音在语音声音之前或之后出现。继电器K1-K11的触点连接在图4的网络中,以指示存在代表十位数字的特定组合。例如,“零”给出来自“ z”声音的发声和摩擦信号。继电器K1和K4在图4中的“零”线上给出输出。其他数字字以类似的方式标识。电路24 29:语音检测器24测量字信号的正包络的峰值与负包络的峰值之间的差。信号通常是复波,而不是阻尼振荡。信号被施加到使所有感兴趣的频率通过的移相电路。它由具有由可调电阻器60和电容器61组成的网络的晶体管50组成。输出通过变压器63施加到极性相反的二极管,每个二极管具有电容器68、73,并通过电阻器耦合到结点70。语音信号在两个电容器68、73之间产生不平衡,并在端子70上输出相应的信号。被称为“机器元音”的“ m”和“ n”声音给出了平衡信号,并且在端子上没有输出70.电阻器的调整会改变对不同声音的响应,并可用于区分“三个”和“四个”,前者给出正响应,第二个给出负响应。通过另一种调整,可以以相同的方式区分“一个”和“九”。通过进一步调节,可以响应于特定条件而在一个极性的脉冲之后跟随相反的脉冲。这些响应可以通过合适的电路来识别,例如,可以通过第一极性的脉冲设置多谐振荡器,并且其输出用于在预定的时间内启用第二脉冲的门。区分“两个”和“七个”的电路27包括图10的高通滤波器100和低通滤波器102,其输出通过相反极性的二极管施加到积分电路。输出在电阻器122中相加组合。“两个”和“七个”的输出极性相反。图8所示的电路28包括一个高通滤波器80(通过5000个周期的信号),其输出通过可调电阻器81,二极管82施加到积分电容器84。可以连接一个阈值器件以响应强烈的摩擦信号。如图9所示,电路29检测到微弱的摩擦声。将输入信号加到高增益限幅放大器87上,得到一系列矩形脉冲,该矩形脉冲触发多谐振荡器88,为每个短脉冲零交叉提供一系列短脉冲。输入信号。整流和积分电路90、91、93、94用于测量在一定时间段内发生的零交叉的数量。阈值设备检测到的某个值的输出,表示摩擦声音弱。双元音单词:-通过将单词信号的第一部分切换到第一寄存器并在检测到音节的机器将第二部分切换到第二寄存器之后,可以扩展系统以识别双语音单词。两个寄存器的输出被组合以识别字。音节检测器可以响应第二语音声音信号的出现。

著录项

  • 公开/公告号DE1422040A1

    专利类型

  • 公开/公告日1971-09-30

    原文格式PDF

  • 申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORP.;

    申请/专利号DE19611422040

  • 发明设计人 C. DERSCHWILLIAM;

    申请日1961-08-28

  • 分类号G10L1/00;

  • 国家 DE

  • 入库时间 2022-08-23 09:54:05

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号