首页> 外文期刊>IBM Journal of Research and Development >Artificial Auditory Recognition in Telephony
【24h】

Artificial Auditory Recognition in Telephony

机译:电话中的人工听觉识别

获取原文
       

摘要

Machines which automatically recognize patterns from a stream of acoustic events, for example a spoken command, would have great utility in both communications and data processing. This paper reviews two applications of an elementary recognizer to the problem of actuating certain logical functions, and indicates how more ambitious recognizers might be utilized. In this regard, the automatic measurement of a talker''s voice pitch and voicing dynamics appears fundamental to speech analysis, and hence to many recognition schemes. Visual inspection of spectral data taken from different speakers supports this contention. Segmentation of speech into discrete units suitable for recognition, including the possibility of overlapping elements, is discussed. There is reason to expect that such segments will span several elementary speech sounds (phonemes). To illustrate this approach, a set of rules is presented for associating visual spectral displays (sound spectrograms) with the perception evoked by the corresponding utterances. These rules are specifically tailored for a limited vocabulary consisting of ten spoken numbers, and were validated by naive subjects who used them to identify the utterances of 33 people. In a further experiment, spectrograms of the same material from 14 talkers were simplified by reducing them to binary elements. It was found that master patterns for each number, compiled from the ensemble of talkers, could identify the utterances with over 99% success. These results emphasize a “diversity” approach to speech recognition which operates on relations between gross spectral features and does not depend exclusively on any one property.
机译:自动从声音事件流(例如语音命令)中识别模式的机器在通信和数据处理方面都将具有很大的实用性。本文回顾了基本识别器在激活某些逻辑功能问题上的两种应用,并指出了如何使用更具野心的识别器。在这一点上,对语音分析者来说,自动测量通话者的音调和发声动态显得很重要,因此对于许多识别方案来说也很重要。目视检查从不同扬声器获取的频谱数据可以支持这种观点。讨论了将语音分割为适合识别的离散单元,包括重叠元素的可能性。有理由期望这样的片段将跨越几种基本语音(音素)。为了说明这种方法,提出了一组规则,用于将视觉频谱显示(声音频谱图)与相应话语引起的感知相关联。这些规则专门针对由十个口语数字组成的有限词汇量身定制,并已由幼稚的主体验证,他们使用它们来识别33个人的话语。在进一步的实验中,来自14个讲话者的相同材料的声谱图通过简化为二进制元素而得到简化。人们发现,通过说话者的合奏编制出的每个数字的主模式可以识别出超过99%成功的话语。这些结果强调了语音识别的“多样性”方法,该方法基于总频谱特征之间的关系起作用,并且不仅仅依赖于任何一个属性。

著录项

  • 来源
    《IBM Journal of Research and Development》 |1958年第4期|P.294-309|共16页
  • 作者

  • 作者单位
  • 收录信息
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 13:27:28

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号